[ 
https://issues.apache.org/jira/browse/IMPALA-11608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17617875#comment-17617875
 ] 

LiPenglin commented on IMPALA-11608:
------------------------------------

Hey [~boroknagyz] could you pls assign me this jira?

While fixing this problem, I also wanted to make impala better pre-load the 
data_location of Iceberg tables  instead of the table_location.

1.The data_location of the Iceberg table is obtained by 
(https://github.com/apache/iceberg/blob/master/api/src/main/java/org/apache/iceberg/Table.java#L309
 
https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/LocationProviders.java#L89)
 before the code line 
https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java#L357
2. Through hdfsTable_.load(...) pass the data_location to 
https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java#L1358
3. Make 'partDir_' in the 'FileMetadataLoader.load()' method be the 
data_location of Iceberg table instead of the table_location

> Impala SHOW TABLE STATS shows wrong number of files for Iceberg tables
> ----------------------------------------------------------------------
>
>                 Key: IMPALA-11608
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11608
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>            Reporter: Zoltán Borók-Nagy
>            Priority: Major
>              Labels: impala-iceberg, ramp-up
>
> Impala SHOW TABLE stats outputs wrong value for number of files for Iceberg 
> tables. It should only calculate the number of data files, but it calculates 
> all files under the table directory, including metadata files, orphaned 
> files, and old data files not belonging to the current snapshot.
> It should only output the number of data files in the current snapshot, 
> making the output consistent with SHOW FILES IN tbl;
> {noformat}
> create table test (i int) stored as iceberg;
> compute stats test;
> show table stats test;
> +-------+--------+--------+--------------+-------------------+---------+-------------------+--------------------------------------------+
> | #Rows | #Files | Size   | Bytes Cached | Cache Replication | Format  | 
> Incremental stats | Location                                   |
> +-------+--------+--------+--------------+-------------------+---------+-------------------+--------------------------------------------+
> | -1    | 2      | 2.70KB | NOT CACHED   | NOT CACHED        | PARQUET | 
> false             | hdfs://localhost:20500/test-warehouse/test |
> +-------+--------+--------+--------------+-------------------+---------+-------------------+--------------------------------------------+
> {noformat}
> SHOW TABLE STATS is handled here: 
> https://github.com/apache/impala/blob/66484a4c081f3242750a3a0e04159dd4580b37a4/fe/src/main/java/org/apache/impala/service/Frontend.java#L1429-L1457



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to