Hey all. Our metastore in mysql is fairly large over 12GB. All the storage here is the columns table. It seems that each column is stored for each partition/storage descriptor as a one-many relationship.
In our case all the partitions have the same column definition. My thinking. Should the relationship from columns->partition/storage descriptor be a many<->many? In this way we only store the column once and the current column table can reference the primary key of this column. This should bring the size of this table down really drastically. Since every other table in the metastore is so small this huge columns table looks like the only scalability choke point we have. Edward