jackye1995 commented on PR #6449: URL: https://github.com/apache/iceberg/pull/6449#issuecomment-1371347815
Looking at the spec, ``` Write data files by using the physical name that is chosen for each column. The physical name of the column is static and can be different than the display name of the column, which is changeable. Write the 32 bit integer column identifier as part of the field_id field of the SchemaElement struct in the [Parquet Thrift specification](https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift). Track partition values and column level statistics with the physical name of the column in the transaction log. ``` Because the column name has changed in the underlying parquet file, migrating that requires not only Iceberg name mapping configuration, but also converting the statistics retrieved from Parquet files. Sounds like something that can be added as the next step after this PR is merged. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
