[ https://issues.apache.org/jira/browse/HIVE-9588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14313282#comment-14313282 ]
Mithun Radhakrishnan commented on HIVE-9588: -------------------------------------------- Another minor update: The numbers quoted above are slashed in half for EXTERNAL tables. Half the problem is the iterative deletion of partition directories. 1. In the short term, perhaps we could add an HCatClient.dropPartitions() overload that takes a deleteData argument, just as HiveMetaStoreClient.drop_partitions_req() does. This way, the caller can choose whether to delete the underlying data. (Should be beneficial for data-loading programs like GDM/Falcon.) 2. In the long term, we should consider classifying the directories so that we drop the common parent, rather than each partition-dir individually. > Reimplement HCatClientHMSImpl.dropPartitions() with HMSC.dropPartitions() > ------------------------------------------------------------------------- > > Key: HIVE-9588 > URL: https://issues.apache.org/jira/browse/HIVE-9588 > Project: Hive > Issue Type: Bug > Components: HCatalog, Metastore, Thrift API > Affects Versions: 0.14.0 > Reporter: Mithun Radhakrishnan > Assignee: Mithun Radhakrishnan > Attachments: HIVE-9588.1.patch, HIVE-9588.2.patch > > > {{HCatClientHMSImpl.dropPartitions()}} currently has an embarrassingly > inefficient implementation. The partial partition-spec is converted into a > filter-string. The partitions are fetched from the server, and then dropped > one by one. > Here's a reimplementation that uses the {{ExprNode}}-based > {{HiveMetaStoreClient.dropPartitions()}}. It cuts out the excessive > back-and-forth between the HMS and the client-side. It also reduces the > memory footprint (from loading all the partitions that are to be dropped). -- This message was sent by Atlassian JIRA (v6.3.4#6332)