RussellSpitzer edited a comment on pull request #2496: URL: https://github.com/apache/iceberg/pull/2496#issuecomment-1035897006
> Rewriting datafiles produces a new snapshot, which derives from the latest snapshot, with its same schema, so it is sort of orthogonal in the sense that the reading behavior is the same. I think this may be an issue since all current implementations of rewrite start by reading the current state of the data and then writing that output to new files. Consider I have two files both missing column A for which I have set a default value of 1. Say my optimize rewrite command touches One of these files and rewrites it. On read it will return rows with a=1. The replacement data file is now filled in with a=1 no default to be applied. Now if I change the default to 2, a row in the unoptimized file will return 2(the new default) for a while rows in the optimized file will return 1 (since that was the value read while rewriting). I think this would be a pretty strange behavior and we should probably figure out how to eliminate it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
