[ 
https://issues.apache.org/jira/browse/HIVE-9588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14313282#comment-14313282
 ] 

Mithun Radhakrishnan commented on HIVE-9588:
--------------------------------------------

Another minor update: 

The numbers quoted above are slashed in half for EXTERNAL tables. Half the 
problem is the iterative deletion of partition directories.
1. In the short term, perhaps we could add an HCatClient.dropPartitions() 
overload that takes a deleteData argument, just as 
HiveMetaStoreClient.drop_partitions_req() does. This way, the caller can choose 
whether to delete the underlying data. (Should be beneficial for data-loading 
programs like GDM/Falcon.)
2. In the long term, we should consider classifying the directories so that we 
drop the common parent, rather than each partition-dir individually.

> Reimplement HCatClientHMSImpl.dropPartitions() with HMSC.dropPartitions()
> -------------------------------------------------------------------------
>
>                 Key: HIVE-9588
>                 URL: https://issues.apache.org/jira/browse/HIVE-9588
>             Project: Hive
>          Issue Type: Bug
>          Components: HCatalog, Metastore, Thrift API
>    Affects Versions: 0.14.0
>            Reporter: Mithun Radhakrishnan
>            Assignee: Mithun Radhakrishnan
>         Attachments: HIVE-9588.1.patch, HIVE-9588.2.patch
>
>
> {{HCatClientHMSImpl.dropPartitions()}} currently has an embarrassingly 
> inefficient implementation. The partial partition-spec is converted into a 
> filter-string. The partitions are fetched from the server, and then dropped 
> one by one.
> Here's a reimplementation that uses the {{ExprNode}}-based 
> {{HiveMetaStoreClient.dropPartitions()}}. It cuts out the excessive 
> back-and-forth between the HMS and the client-side. It also reduces the 
> memory footprint (from loading all the partitions that are to be dropped). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to