Hi Ashish, Thank you for your reply, that explains my problem.
I also find the columns related to a certain partition is identical to the columns which related to other partitions in the same table. So what is the benefit for such a redundant design? 2010/5/27 Ashish Thusoo <[email protected]> > Do you have partitions in the table? Storage descriptors can also be > associated with partitions. > > Ashish > > ------------------------------ > *From:* Ted Xu [mailto:[email protected]] > *Sent:* Wednesday, May 26, 2010 5:26 AM > *To:* [email protected] > *Subject:* Garbage data in metadata store? > > Hi all, > > I want to replicate hive metadata to another place, while I found my hive > metadata contains a big portion of data looks like garbage. > > In my understanding, the hive metadata store use 'Storage Descriptor' to > keep relationship between tables and columns. But the 'SD_ID' columns in > table 'TBLS' and 'COLUMNS' are unbalanced in count, as shown below: > > mysql> select count(distinct SD_ID) from tbls; > +-----------------------+ > | count(distinct SD_ID) | > +-----------------------+ > | 764 | > +-----------------------+ > 1 row in set (0.00 sec) > > mysql> select count(distinct SD_ID) from columns; > +-----------------------+ > | count(distinct SD_ID) | > +-----------------------+ > | 5219 | > +-----------------------+ > 1 row in set (0.05 sec) > > Is that mean table 'columns' contains garbage data? If so, then how it is > generated? > > -- > Best Regards, > Ted Xu > -- Best Regards, Ted Xu
