[ 
https://issues.apache.org/jira/browse/HIVE-12763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15119953#comment-15119953
 ] 

Pengcheng Xiong commented on HIVE-12763:
----------------------------------------

as per [~jpullokkaran]'s request, I extracted the merging function in 
NumDistinctValueEstimator and pasted the performance of merging NDV only 
without out running Hive. In the extreme case, we have 8M partitions and the 
merging time is around 9seconds. It grows linearly with the number of 
partitions. Moreover, 8M partitions needs 16GB memory to hold the bit vectors.

> Use bit vector to track NDV
> ---------------------------
>
>                 Key: HIVE-12763
>                 URL: https://issues.apache.org/jira/browse/HIVE-12763
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Pengcheng Xiong
>            Assignee: Pengcheng Xiong
>         Attachments: HIVE-12763.01.patch, HIVE-12763.02.patch, 
> HIVE-12763.03.patch, HIVE-12763.04.patch, HIVE-12763.05.patch, 
> aggrStatsPerformance.png, performanceOfMergingNDV.png
>
>
> This will improve merging of per partitions stats. It will also help merge 
> NDV for auto-gather column stats.
> !aggrStatsPerformance.png|thumbnail!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to