Hey Ed, Your thinking is correct and has been implemented in https://issues.apache.org/jira/browse/HIVE-2246
Time to upgrade to 0.8 :) Thanks, Ashutosh On Wed, Apr 11, 2012 at 07:53, Edward Capriolo <edlinuxg...@gmail.com>wrote: > Hey all. Our metastore in mysql is fairly large over 12GB. All the > storage here is the columns table. It seems that each column is stored > for each partition/storage descriptor as a one-many relationship. > > In our case all the partitions have the same column definition. My > thinking. Should the relationship from columns->partition/storage > descriptor be a many<->many? In this way we only store the column once > and the current column table can reference the primary key of this > column. This should bring the size of this table down really > drastically. > > Since every other table in the metastore is so small this huge columns > table looks like the only scalability choke point we have. > > Edward >