[
https://issues.apache.org/jira/browse/HIVE-8061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Pengcheng Xiong updated HIVE-8061:
----------------------------------
Attachment: HIVE-8061.1.patch
Major improvement
(1) All the partition status update/insert is now done in one transaction.
(2) Rather than to use a query to update per col per partition (total query =
#col * # part),
now we use 1 query to delete everything and then use 1 query to insert
everything. The transaction makes sure that this happens in ACID mode.
> improve the speed of col stats update speed
> -------------------------------------------
>
> Key: HIVE-8061
> URL: https://issues.apache.org/jira/browse/HIVE-8061
> Project: Hive
> Issue Type: Improvement
> Reporter: Pengcheng Xiong
> Assignee: Pengcheng Xiong
> Priority: Minor
> Attachments: HIVE-8061.1.patch
>
>
> We worked hard towards faster update stats for columns of a partition of a
> table previously https://issues.apache.org/jira/browse/HIVE-7736
> and https://issues.apache.org/jira/browse/HIVE-7876
> Although there is some improvement, it is only correct in the first run.
> There will be duplicate column stats later. Thanks to Eugene Koifman 's
> comments.
> We fixed this in https://issues.apache.org/jira/browse/HIVE-7944 by reversing
> the patch.
> This JIRA ticket is my another try to improve the speed.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)