[GitHub] [iceberg] jackye1995 commented on pull request #6449: WIP: Delta, Spark: Adding support for Migrating Delta Lake Table to Iceberg Table

GitBox Wed, 04 Jan 2023 11:47:09 -0800


jackye1995 commented on PR #6449:
URL: https://github.com/apache/iceberg/pull/6449#issuecomment-1371347815


   Looking at the spec,
   
   ```
   Write data files by using the physical name that is chosen for each column. 
The physical name of the column is static and can be different than the display 
name of the column, which is changeable.
   
   Write the 32 bit integer column identifier as part of the field_id field of 
the SchemaElement struct in the [Parquet Thrift 
specification](https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift).
   
   Track partition values and column level statistics with the physical name of 
the column in the transaction log.
   ```
   
   Because the column name has changed in the underlying parquet file, 
migrating that requires not only Iceberg name mapping configuration, but also 
converting the statistics retrieved from Parquet files.
   
   Sounds like something that can be added as the next step after this PR is 
merged.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] jackye1995 commented on pull request #6449: WIP: Delta, Spark: Adding support for Migrating Delta Lake Table to Iceberg Table

Reply via email to