[ 
https://issues.apache.org/jira/browse/IMPALA-10569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noemi Pap-Takacs reopened IMPALA-10569:
---------------------------------------

Currently this information is only available during planning time in 
IcebergScanNode. It should be available earlier, in load time.

It would be nice to add it as the implementation of 
FeIcebergTable.getFileFormats(). It currently falls back to 
HdfsTable.getFileFormats(), which relies on HDFS metadata and cannot handle 
multiple file formats per partition, thus not a usable implementation for 
Iceberg.

> Impala should determine Iceberg data file format from Iceberg metadata
> ----------------------------------------------------------------------
>
>                 Key: IMPALA-10569
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10569
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>            Reporter: Zoltán Borók-Nagy
>            Priority: Major
>              Labels: impala-iceberg
>
> When Impala creates an Iceberg table it sets HMS table property 
> 'iceberg.file_format' to indicate the underlying data file format.
> However, when the table was created by Hive or Spark, we don't have this 
> property and Impala assumes that the data file format is PARQUET. This 
> assumption is just a wild guess, and when it's wrong Impala raises an error 
> during query execution.
> Instead of only checking the table property, Impala could also try to 
> determine the file format based on Iceberg metadata.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to