[ 
https://issues.apache.org/jira/browse/HIVE-23390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17931888#comment-17931888
 ] 

Shohei Okumiya commented on HIVE-23390:
---------------------------------------

I can reproduce the issue with Hive 3.1.3 for both table and partition column 
statistics.

[https://github.com/okumin/hive/commit/7fe03ca2aa6925349e046d582969aabf83b5b282]

 

On the master branch, the test for table column statistics passes, where all 
`HiveMetaStoreClient#updateTableColumnStatistics` succeeds. The test for 
partition column statistics still fails.

[https://github.com/okumin/hive/commit/013fdf83bbaa74dc7c10c311fee813528202bde1]
 

> Duplicate entry for a table in TAB_COL_STATS 
> ---------------------------------------------
>
>                 Key: HIVE-23390
>                 URL: https://issues.apache.org/jira/browse/HIVE-23390
>             Project: Hive
>          Issue Type: Bug
>          Components: Metastore
>    Affects Versions: 2.3.4
>            Reporter: Mithun Antony
>            Priority: Major
>
> When *_analyze <table>_* command was executed from presto to update the stats 
> of a table for the first time from multiple cluster sharing the same Hive 
> metastore. Duplicate entry for the same table is inserted to the 
> *_TAB_COL_STATS_* table.
> This lead to failure executing further *_analyze <table>_* commands. 
> {code:java}
> Query failed: Multiple entries with same key: 
> dummy=HiveColumnStatistics{integerStatistics=Optional[IntegerStatistics{min=OptionalLong[1],
>  max=OptionalLong[1]}], doubleStatistics=Optional.empty, 
> decimalStatistics=Optional.empty, dateStatistics=Optional.empty, 
> booleanStatistics=Optional.empty, maxValueSizeInBytes=OptionalLong.empty, 
> totalSizeInBytes=OptionalLong.empty, nullsCount=OptionalLong[0], 
> distinctValuesCount=OptionalLong[1]} and 
> dummy=HiveColumnStatistics{integerStatistics=Optional[IntegerStatistics{min=OptionalLong[1],
>  max=OptionalLong[1]}], doubleStatistics=Optional.empty, 
> decimalStatistics=Optional.empty, dateStatistics=Optional.empty, 
> booleanStatistics=Optional.empty, maxValueSizeInBytes=OptionalLong.empty, 
> totalSizeInBytes=OptionalLong.empty, nullsCount=OptionalLong[0], 
> distinctValuesCount=OptionalLong[1]}.
> {code}
> Duplicate records in the *_TAB_COL_STATS_*
> {code:java}
> '7','default','dual','dummy','smallint','245671','1','1',NULL,NULL,NULL,NULL,'0','1',NULL,NULL,NULL,NULL,'1588345509'
>  
> '11','default','dual','dummy','smallint','245671','1','1',NULL,NULL,NULL,NULL,'0','1',NULL,NULL,NULL,NULL,'1588345509'{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to