[ 
https://issues.apache.org/jira/browse/IMPALA-11577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gergely Fürnstáhl updated IMPALA-11577:
---------------------------------------
    Description: 
Spawned from IMPALA-10610
Impala supports mixed file formats for Iceberg tables, which means every file 
can have different file format and it uses the set of existing file formats for 
planning purposes. Currently Impala goes through all file's metadata to 
aggregate this information, which can be slow if there are lots of data files.

We could optimized this by storing this aggregated information somewhere (e.g. 
in Iceberg - yet to be implemented - 
[https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/SnapshotSummary.java])

  was:
Impala supports mixed file formats for Iceberg tables, which means every file 
can have different file format and it uses the set of existing file formats for 
planning purposes. Currently Impala goes through all file's metadata to 
aggregate this information, which can be slow if there are lots of data files.

We could optimized this by storing this aggregated information somewhere (e.g. 
in Iceberg - yet to be implemented - 
https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/SnapshotSummary.java)


> Optimize getting stored file types for Iceberg tables
> -----------------------------------------------------
>
>                 Key: IMPALA-11577
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11577
>             Project: IMPALA
>          Issue Type: Improvement
>            Reporter: Gergely Fürnstáhl
>            Priority: Major
>
> Spawned from IMPALA-10610
> Impala supports mixed file formats for Iceberg tables, which means every file 
> can have different file format and it uses the set of existing file formats 
> for planning purposes. Currently Impala goes through all file's metadata to 
> aggregate this information, which can be slow if there are lots of data files.
> We could optimized this by storing this aggregated information somewhere 
> (e.g. in Iceberg - yet to be implemented - 
> [https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/SnapshotSummary.java])



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to