RussellSpitzer commented on pull request #2496: URL: https://github.com/apache/iceberg/pull/2496#issuecomment-1035897006
> Rewriting datafiles produces a new snapshot, which derives from the latest snapshot, with its same schema, so it is sort of orthogonal in the sense that the reading behavior is the same. I think this may be an issue since all current implementations of rewrite start by reading the current state of the data and then writing that output to new files. Consider I have two files both missing column A for which I have set a default value of 1. Say my optimize rewrite command touches One of these files and rewrites it. On read it will see that a has a value of 1 and return rows with a=1. A the replacement data file is now filled in with a=1. Now if I change the default to 2, row in the unoptimized file will return 2(the new default) for a while those in the optimized file will return 1. I think this would be a pretty strange behavior -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
