szehon-ho commented on PR #4898:
URL: https://github.com/apache/iceberg/pull/4898#issuecomment-1209652918

   @ConeyLiu that's a good question, I think (may be wrong) rewriteDataFiles 
groups files by partition/partition spec, and may not preserve the old schemas. 
 Ie, all the data files are rewritten with latest schema of that partition 
spec.  
   
   I think the situation would be the same even in your proposal to add new 
schemaid field to data_file, right?  After rewriteDataFiles we have to carry 
over the latest schema-id of each spec , in order for your initial proposed 
optimization to be accurate?  Because there may be data in the new file that 
was written by a later schema.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to