[GitHub] [iceberg] RussellSpitzer edited a comment on pull request #2496: [#2039] Support default value semantics - API changes

GitBox Thu, 10 Feb 2022 21:37:44 -0800


RussellSpitzer edited a comment on pull request #2496:
URL: https://github.com/apache/iceberg/pull/2496#issuecomment-1035897006



   
   > Rewriting datafiles produces a new snapshot, which derives from the latest 
snapshot, with its same schema, so it is sort of orthogonal in the sense that 
the reading behavior is the same.
   
   I think this may be an issue since all current implementations of rewrite 
start by reading the current state of the data and then writing that output to 
new files. Consider I have two files both missing column A for which I have set 
a default value of 1. Say my optimize rewrite command touches
   One of these files and rewrites it. On read it will return rows with a=1. 
The replacement data file is now filled in with a=1 no default to be applied. 
Now if I change the default to 2, a row in the unoptimized file will return 
2(the new default) for a while rows in the optimized file will return 1 (since 
that was the value read while rewriting). I think this would be a pretty 
strange behavior and we should probably figure out how to eliminate it


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] RussellSpitzer edited a comment on pull request #2496: [#2039] Support default value semantics - API changes

Reply via email to