[ https://issues.apache.org/jira/browse/HIVE-28702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Shohei Okumiya updated HIVE-28702: ---------------------------------- Summary: Statistics are inconsistent on time travel queries (was: Time travel queries calculate incorrect statistics) > Statistics are inconsistent on time travel queries > -------------------------------------------------- > > Key: HIVE-28702 > URL: https://issues.apache.org/jira/browse/HIVE-28702 > Project: Hive > Issue Type: Bug > Components: Iceberg integration, Statistics > Affects Versions: 4.0.1 > Reporter: Shohei Okumiya > Assignee: Shohei Okumiya > Priority: Major > Attachments: image-2025-01-12-21-23-02-639.png > > > Time-travel queries using a snapshot id, timestamp, branching, or tagging can > run with incorrect statistics. > This set of queries reproduces the problem. > {code:java} > SET hive.fetch.task.conversion=none; > CREATE TABLE default.test (i1 INT, i2 INT) STORED BY ICEBERG; > INSERT INTO default.test VALUES (1, 11), (2, 22); > ALTER TABLE default.test CREATE TAG with_2_records; > EXPLAIN SELECT * FROM default.test.tag_with_2_records; > INSERT INTO default.test VALUES (null, null), (null, null), (null, null), > (null, null), (null, null), (null, null), (null, null), (null, null), (null, > null), (null, null), (null, null), (null, null); > EXPLAIN SELECT * FROM default.test.tag_with_2_records; {code} > The first EXPLAIN shows the correct size of statistics, with 2 records. > {code:java} > | Map 1 | > | Map Operator Tree: | > | TableScan | > | alias: test | > | Snapshot ref: tag_with_2_records | > | Statistics: Num rows: 2 Data size: 16 Basic stats: > COMPLETE Column stats: COMPLETE | > | Select Operator | > | expressions: i1 (type: int), i2 (type: int) | > | outputColumnNames: _col0, _col1 | > | Statistics: Num rows: 2 Data size: 16 Basic stats: > COMPLETE Column stats: COMPLETE | {code} > The size is broken after I ran the second INSERT query. > {code:java} > | Map 1 | > | Map Operator Tree: | > | TableScan | > | alias: test | > | Snapshot ref: tag_with_2_records | > | Statistics: Num rows: 2 Data size: 6610 Basic stats: > COMPLETE Column stats: COMPLETE | > | Select Operator | > | expressions: i1 (type: int), i2 (type: int) | > | outputColumnNames: _col0, _col1 | > | Statistics: Num rows: 2 Data size: -72 Basic stats: > COMPLETE Column stats: COMPLETE | {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)