Re: [I] Hidden partitioning clarification: do `identity` partition columns need to be in data files? [iceberg]

via GitHub Tue, 23 Dec 2025 12:21:28 -0800


amogh-jahagirdar commented on issue #14914:
URL: https://github.com/apache/iceberg/issues/14914#issuecomment-3687858095


   Iceberg does require explicit materialization of columns in the data file, 
even those that are used in partitioning schemes. Ultimately partitioning is a 
transformation or derivation on a column (or columns) but the materialization 
in data files is helpful in case metadata gets corrupted.  
   
   @JerAguilon Here's the current spec language 
https://iceberg.apache.org/spec/#writing-data-files
   
   ```
   All columns must be written to data files even if they introduce redundancy 
with metadata stored in manifest files (e.g. columns with identity partition 
transforms). Writing all columns provides a backup in case of corruption or 
bugs in the metadata layer.
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Hidden partitioning clarification: do `identity` partition columns need to be in data files? [iceberg]

Reply via email to