Idea is to allow schema evolution. Old partitions retain old schema but new partitions can change schema (including INPUT/OUTPUT format, serde etc). I think some of this is already allowed.
On May 26, 2010, at 8:43 PM, Ted Xu wrote: Hi Ashish, Thank you for your reply, that explains my problem. I also find the columns related to a certain partition is identical to the columns which related to other partitions in the same table. So what is the benefit for such a redundant design? 2010/5/27 Ashish Thusoo <[email protected]<mailto:[email protected]>> Do you have partitions in the table? Storage descriptors can also be associated with partitions. Ashish ________________________________ From: Ted Xu [mailto:[email protected]<mailto:[email protected]>] Sent: Wednesday, May 26, 2010 5:26 AM To: [email protected]<mailto:[email protected]> Subject: Garbage data in metadata store? Hi all, I want to replicate hive metadata to another place, while I found my hive metadata contains a big portion of data looks like garbage. In my understanding, the hive metadata store use 'Storage Descriptor' to keep relationship between tables and columns. But the 'SD_ID' columns in table 'TBLS' and 'COLUMNS' are unbalanced in count, as shown below: mysql> select count(distinct SD_ID) from tbls; +-----------------------+ | count(distinct SD_ID) | +-----------------------+ | 764 | +-----------------------+ 1 row in set (0.00 sec) mysql> select count(distinct SD_ID) from columns; +-----------------------+ | count(distinct SD_ID) | +-----------------------+ | 5219 | +-----------------------+ 1 row in set (0.05 sec) Is that mean table 'columns' contains garbage data? If so, then how it is generated? -- Best Regards, Ted Xu -- Best Regards, Ted Xu
