[
https://issues.apache.org/jira/browse/IMPALA-11608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17617875#comment-17617875
]
LiPenglin commented on IMPALA-11608:
------------------------------------
Hey [~boroknagyz] could you pls assign me this jira?
While fixing this problem, I also wanted to make impala better pre-load the
data_location of Iceberg tables instead of the table_location.
1.The data_location of the Iceberg table is obtained by
(https://github.com/apache/iceberg/blob/master/api/src/main/java/org/apache/iceberg/Table.java#L309
https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/LocationProviders.java#L89)
before the code line
https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java#L357
2. Through hdfsTable_.load(...) pass the data_location to
https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java#L1358
3. Make 'partDir_' in the 'FileMetadataLoader.load()' method be the
data_location of Iceberg table instead of the table_location
> Impala SHOW TABLE STATS shows wrong number of files for Iceberg tables
> ----------------------------------------------------------------------
>
> Key: IMPALA-11608
> URL: https://issues.apache.org/jira/browse/IMPALA-11608
> Project: IMPALA
> Issue Type: Bug
> Components: Frontend
> Reporter: Zoltán Borók-Nagy
> Priority: Major
> Labels: impala-iceberg, ramp-up
>
> Impala SHOW TABLE stats outputs wrong value for number of files for Iceberg
> tables. It should only calculate the number of data files, but it calculates
> all files under the table directory, including metadata files, orphaned
> files, and old data files not belonging to the current snapshot.
> It should only output the number of data files in the current snapshot,
> making the output consistent with SHOW FILES IN tbl;
> {noformat}
> create table test (i int) stored as iceberg;
> compute stats test;
> show table stats test;
> +-------+--------+--------+--------------+-------------------+---------+-------------------+--------------------------------------------+
> | #Rows | #Files | Size | Bytes Cached | Cache Replication | Format |
> Incremental stats | Location |
> +-------+--------+--------+--------------+-------------------+---------+-------------------+--------------------------------------------+
> | -1 | 2 | 2.70KB | NOT CACHED | NOT CACHED | PARQUET |
> false | hdfs://localhost:20500/test-warehouse/test |
> +-------+--------+--------+--------------+-------------------+---------+-------------------+--------------------------------------------+
> {noformat}
> SHOW TABLE STATS is handled here:
> https://github.com/apache/impala/blob/66484a4c081f3242750a3a0e04159dd4580b37a4/fe/src/main/java/org/apache/impala/service/Frontend.java#L1429-L1457
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]