[
https://issues.apache.org/jira/browse/HIVE-29235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18024054#comment-18024054
]
Jaeho Yoo commented on HIVE-29235:
----------------------------------
[~zhangbutao]
So we first used Flink to produced data in iceberg format (without the flink
option of `write.metadata.statistics.enabled`), with 2 partitions (and about 30
other columns) with log_date as partition field (string type)
Then, simply do `SELECT count(*) FROM db1.tb1 WHERE log_date = '2025-09-29'`
> Iceberg returns incorrect count value
> -------------------------------------
>
> Key: HIVE-29235
> URL: https://issues.apache.org/jira/browse/HIVE-29235
> Project: Hive
> Issue Type: Bug
> Reporter: Jaeho Yoo
> Priority: Major
>
> For iceberg table, Hive tries to read partitionStatistics.
> But if the table doesn't have them, Hive calculates using default statistics,
> which is incorrect.
> We are using Hive 4.1.0.
> SELECT count(*), log_date FROM db1.tbl1 GROUP BY 2;
> +----------+-------------+
> | _c0 | log_date |
> +----------+-------------+
> | 343662 | 2025-09-29 |
> | 2513459 | 2025-09-30 |
>
>
> SELECT count(*) FROM db1.tb1 WHERE log_date = '2025-09-29'; // 2857121
> SELECT count(*) FROM db1.tb1 WHERE log_date = '2025-09-30'; // 2857121
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)