[
https://issues.apache.org/jira/browse/HIVE-23095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17071690#comment-17071690
]
Zoltan Haindrich commented on HIVE-23095:
-----------------------------------------
the
[getSize()|https://github.com/apache/hive/blob/d2ad5b061706a1d3cd55e59c769ed4f2af01cdbe/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/common/ndv/hll/HLLSparseRegister.java#L152]
method was adjusted with the tempList size in HIVE-19578; which causes the
{{getSize}} method to be an overestimation of the actual size - because there
is limit value at which the SPARSE/DENSE switch happens ; that code could be
triggered for much less values triggered [in
HyperLogLog.add|https://github.com/apache/hive/blob/d2ad5b061706a1d3cd55e59c769ed4f2af01cdbe/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/common/ndv/hll/HyperLogLog.java#L261]
> NDV might be overestimated for a table with ~70 value
> -----------------------------------------------------
>
> Key: HIVE-23095
> URL: https://issues.apache.org/jira/browse/HIVE-23095
> Project: Hive
> Issue Type: Bug
> Reporter: Zoltan Haindrich
> Assignee: Zoltan Haindrich
> Priority: Major
>
> uncovered during looking into HIVE-23082
> https://issues.apache.org/jira/browse/HIVE-23082?focusedCommentId=17067773&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17067773
--
This message was sent by Atlassian Jira
(v8.3.4#803005)