[
https://issues.apache.org/jira/browse/HIVE-24870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
László Bodor reassigned HIVE-24870:
-----------------------------------
Assignee: (was: László Bodor)
> Metastore: cleanup unused column descriptors asynchronously in batches
> ----------------------------------------------------------------------
>
> Key: HIVE-24870
> URL: https://issues.apache.org/jira/browse/HIVE-24870
> Project: Hive
> Issue Type: Improvement
> Reporter: László Bodor
> Priority: Major
>
> HIVE-2246 introduces CD_ID for optimizing metastore db (details there).
> ObjectStore.removeUnusedColumnDescriptor is a maintenance task that is called
> in every alter partition kind of operation. During a replication,
> alterPartition could be a heavy path, and has no direct advantage of running
> removeUnusedColumnDescriptor immediately. Moreover, there is a
> {code}
> select count(*) from "SDS" where "CD_ID"=12345;
> {code}
> kind of query in it, which can take a relatively long time compared to alter
> partition.
> https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L4982
> {code}
> query = pm.newQuery("select count(1) from " +
> "org.apache.hadoop.hive.metastore.model.MStorageDescriptor where
> (this.cd == inCD)");
> query.declareParameters("MColumnDescriptor inCD");
> long count = ((Long)query.execute(oldCD)).longValue();
> //if no other SD references this CD, we can throw it out.
> if (count == 0) {
> {code}
> My proposal is to run this in a batched way, in every configurable amount of
> seconds/minutes/whatever.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)