Idea is to allow schema evolution. Old partitions retain old schema but new 
partitions can change schema (including INPUT/OUTPUT format, serde etc). I 
think some of this is already allowed.

On May 26, 2010, at 8:43 PM, Ted Xu wrote:

Hi Ashish,

Thank you for your reply, that explains my problem.

I also find the columns related to a certain partition is identical to the 
columns which related to other partitions in the same table. So what is the 
benefit for such a redundant design?

2010/5/27 Ashish Thusoo <[email protected]<mailto:[email protected]>>
Do you have partitions in the table? Storage descriptors can also be associated 
with partitions.

Ashish

________________________________
From: Ted Xu [mailto:[email protected]<mailto:[email protected]>]
Sent: Wednesday, May 26, 2010 5:26 AM
To: [email protected]<mailto:[email protected]>
Subject: Garbage data in metadata store?

Hi all,

I want to replicate hive metadata to another place, while I found my hive 
metadata contains a big portion of data looks like garbage.

In my understanding, the hive metadata store use 'Storage Descriptor' to keep 
relationship between tables and columns. But the 'SD_ID' columns in table 
'TBLS' and 'COLUMNS' are unbalanced in count, as shown below:

mysql> select count(distinct SD_ID) from tbls;
+-----------------------+
| count(distinct SD_ID) |
+-----------------------+
|                   764 |
+-----------------------+
1 row in set (0.00 sec)

mysql> select count(distinct SD_ID) from columns;
+-----------------------+
| count(distinct SD_ID) |
+-----------------------+
|                  5219 |
+-----------------------+
1 row in set (0.05 sec)

Is that mean table 'columns' contains garbage data? If so, then how it is 
generated?

--
Best Regards,
Ted Xu



--
Best Regards,
Ted Xu

Reply via email to