RussellSpitzer edited a comment on pull request #2496:
URL: https://github.com/apache/iceberg/pull/2496#issuecomment-1035897006


   
   > Rewriting datafiles produces a new snapshot, which derives from the latest 
snapshot, with its same schema, so it is sort of orthogonal in the sense that 
the reading behavior is the same.
   
   I think this may be an issue since all current implementations of rewrite 
start by reading the current state of the data and then writing that output to 
new files. Consider I have two files both missing column A for which I have set 
a default value of 1. Say my optimize rewrite command touches one of these 
files and rewrites it. On read it will return rows with a=1. The replacement 
data file is now filled in with a=1, the column is no longer missing so no 
default will be applied on read. Now if I change the default to 2, a row in the 
unoptimized file will return 2(the new default for a data file missing a) while 
rows in the optimized file will return 1 (since that was the value read while 
rewriting). I think this would be a pretty strange behavior and we should 
probably figure out how to eliminate it


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to