[ 
https://issues.apache.org/jira/browse/IMPALA-12667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boglarka Egyed reassigned IMPALA-12667:
---------------------------------------

    Assignee: Zoltán Borók-Nagy

> Mention interoperability considerations for Iceberg table conversion
> --------------------------------------------------------------------
>
>                 Key: IMPALA-12667
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12667
>             Project: IMPALA
>          Issue Type: Documentation
>            Reporter: Zoltán Borók-Nagy
>            Assignee: Zoltán Borók-Nagy
>            Priority: Major
>              Labels: impala-iceberg
>
> When Impala writes legacy tables with STRING columns, it doesn't add UTF8 
> annotation in the Parquet files. It doesn't do it because the users might 
> store binary data in STRING columns (Impala only supports BINARY columns 
> recently).
> When a legacy table is converted to Iceberg, the data files are not 
> re-written, i.e. we just create the Iceberg metadata files over the existing 
> data files.
> The Iceberg spec requires STRING columns to be stored with UTF8 annotation in 
> Parquet files. Non-impala readers might throw exceptions when they find 
> STRING columns without UTF8 annotation.
> Add a section about the above in the docs. Also mention CTAS statements as a 
> possible workaround.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to