[I] [Spec] Linking Schema ID to Data & Delete Files [iceberg]

via GitHub Mon, 18 Aug 2025 10:05:11 -0700


manirajv06 opened a new issue, #13855:
URL: https://github.com/apache/iceberg/issues/13855


   ### Proposed Change
   
   Schema evolve over time and data files could have different columns at 
different point of time. It is quite natural that data files created at T1 with 
Schema S1 could have columns C1 to C5, data files created at T2 with Schema S1 
could have columns C4 to C10 and so on..
   
   Linking Schema ID with data files would be handy to extract any Schema 
details easily. For an instance, Files could be filtered based on whether    
column exists or not using its field id by comparing with file's max field id. 
Max field id of the file is nothing but the max field id of the linked schema. 
Schema's Max field id is already available and can be used straight away. C5 is 
the max field id for all files linked to S1. C10 is the max field id for all 
files linked to S2. Another instance, to know whether Parquet files has 
`UnknownType` type or not, all files needs to be opened as there is no 
statistics or other way to know it. Linking schema's to these files could pull 
those info very easily. Similarly, other schema info can be used based on the 
requirements.
   
   I would like to propose that linking the schema id with files would be 
useful in carrying out files and schema related operations going forward.
   
   ### Proposal document
   
   _No response_
   
   ### Specifications
   
   - [ ] Table
   - [ ] View
   - [ ] REST
   - [ ] Puffin
   - [ ] Encryption
   - [ ] Other


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[I] [Spec] Linking Schema ID to Data & Delete Files [iceberg]

Reply via email to