[ 
https://issues.apache.org/jira/browse/HIVE-29275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18035793#comment-18035793
 ] 

Krisztian Kasa commented on HIVE-29275:
---------------------------------------

I investigated the issue and found that Hive uses the 
[compareTo|https://github.com/apache/hive/blob/2e1af9b37b998d721c31132d231fa13fc4375353/standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Decimal.java#L289-L318]
 method generated by Thrift when merging the existing and newly inserted column 
stats. However, this method doesn't compare the actual decimal values but 
rather the internal field values that represent them. In case of -123.2 and 
-10.2 the scale is the same so finally the result of {{ByteBuffer.compareTo}} 
is used.

The {{compareTo}} method could be used in several times. The one which is 
relevant for this issue is in the 
[DecimalColumnStatsMerger|https://github.com/apache/hive/blob/2e1af9b37b998d721c31132d231fa13fc4375353/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/columnstats/merge/DecimalColumnStatsMerger.java#L85]
 class.


> Stats autogather calculates the min statistic incorrectly
> ---------------------------------------------------------
>
>                 Key: HIVE-29275
>                 URL: https://issues.apache.org/jira/browse/HIVE-29275
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 4.1.0, 4.0.1
>            Reporter: Thomas Rebele
>            Priority: Major
>              Labels: hive-4.2.0-candidate
>
>  In stats_histogram.q autogather gets enabled and then rows are inserted into 
> a newly created table. The minimum value for column e is 
> [-123.2|https://github.com/apache/hive/blob/55d9ab7d6b00fa510be791b9de55974f61c90519/ql/src/test/queries/clientpositive/stats_histogram.q#L20].
>  However, {{DESCRIBE FORMATTED test_stats e}} shows 
> [-10.2|https://github.com/apache/hive/blob/55d9ab7d6b00fa510be791b9de55974f61c90519/ql/src/test/results/clientpositive/llap/stats_histogram.q.out#L364]
>  as the minimum value.
> When executing {{ANALYZE TABLE test_stats COMPUTE STATISTICS FOR COLUMNS;}} 
> before the {{DESCRIBE FORMATTED test_stats e}} command, the [min value is 
> -123.2|https://github.com/thomasrebele/hive/commit/2be9bef2851028678fa6752f7482080b3d201a51#diff-436ceeced7ea88c3ad4d931cfbf3d09feb838eef368a74ca8106d378209b1209L262-L364]
>  as expected.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to