Zoltán Borók-Nagy created IMPALA-12667:
------------------------------------------

             Summary: Mention interoperability considerations for Iceberg table 
conversion
                 Key: IMPALA-12667
                 URL: https://issues.apache.org/jira/browse/IMPALA-12667
             Project: IMPALA
          Issue Type: Documentation
            Reporter: Zoltán Borók-Nagy


When Impala writes legacy tables with STRING columns, it doesn't add UTF8 
annotation in the Parquet files. It doesn't do it because the users might store 
binary data in STRING columns (Impala only supports BINARY columns recently).

When a legacy table is converted to Iceberg, the data files are not re-written, 
i.e. we just create the Iceberg metadata files over the existing data files.

The Iceberg spec requires STRING columns to be stored with UTF8 annotation in 
Parquet files. Non-impala readers might throw exceptions when they find STRING 
columns without UTF8 annotation.

Add a section about the above in the docs. Also mention CTAS statements as a 
possible workaround.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to