Zoltán Borók-Nagy created IMPALA-12667:
------------------------------------------
Summary: Mention interoperability considerations for Iceberg table
conversion
Key: IMPALA-12667
URL: https://issues.apache.org/jira/browse/IMPALA-12667
Project: IMPALA
Issue Type: Documentation
Reporter: Zoltán Borók-Nagy
When Impala writes legacy tables with STRING columns, it doesn't add UTF8
annotation in the Parquet files. It doesn't do it because the users might store
binary data in STRING columns (Impala only supports BINARY columns recently).
When a legacy table is converted to Iceberg, the data files are not re-written,
i.e. we just create the Iceberg metadata files over the existing data files.
The Iceberg spec requires STRING columns to be stored with UTF8 annotation in
Parquet files. Non-impala readers might throw exceptions when they find STRING
columns without UTF8 annotation.
Add a section about the above in the docs. Also mention CTAS statements as a
possible workaround.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)