[
https://issues.apache.org/jira/browse/IMPALA-12667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Boglarka Egyed reassigned IMPALA-12667:
---------------------------------------
Assignee: Zoltán Borók-Nagy
> Mention interoperability considerations for Iceberg table conversion
> --------------------------------------------------------------------
>
> Key: IMPALA-12667
> URL: https://issues.apache.org/jira/browse/IMPALA-12667
> Project: IMPALA
> Issue Type: Documentation
> Reporter: Zoltán Borók-Nagy
> Assignee: Zoltán Borók-Nagy
> Priority: Major
> Labels: impala-iceberg
>
> When Impala writes legacy tables with STRING columns, it doesn't add UTF8
> annotation in the Parquet files. It doesn't do it because the users might
> store binary data in STRING columns (Impala only supports BINARY columns
> recently).
> When a legacy table is converted to Iceberg, the data files are not
> re-written, i.e. we just create the Iceberg metadata files over the existing
> data files.
> The Iceberg spec requires STRING columns to be stored with UTF8 annotation in
> Parquet files. Non-impala readers might throw exceptions when they find
> STRING columns without UTF8 annotation.
> Add a section about the above in the docs. Also mention CTAS statements as a
> possible workaround.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]