[
https://issues.apache.org/jira/browse/HIVE-12763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15119953#comment-15119953
]
Pengcheng Xiong commented on HIVE-12763:
----------------------------------------
as per [~jpullokkaran]'s request, I extracted the merging function in
NumDistinctValueEstimator and pasted the performance of merging NDV only
without out running Hive. In the extreme case, we have 8M partitions and the
merging time is around 9seconds. It grows linearly with the number of
partitions. Moreover, 8M partitions needs 16GB memory to hold the bit vectors.
> Use bit vector to track NDV
> ---------------------------
>
> Key: HIVE-12763
> URL: https://issues.apache.org/jira/browse/HIVE-12763
> Project: Hive
> Issue Type: Improvement
> Reporter: Pengcheng Xiong
> Assignee: Pengcheng Xiong
> Attachments: HIVE-12763.01.patch, HIVE-12763.02.patch,
> HIVE-12763.03.patch, HIVE-12763.04.patch, HIVE-12763.05.patch,
> aggrStatsPerformance.png, performanceOfMergingNDV.png
>
>
> This will improve merging of per partitions stats. It will also help merge
> NDV for auto-gather column stats.
> !aggrStatsPerformance.png|thumbnail!
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)