jackye1995 commented on pull request #2096: URL: https://github.com/apache/iceberg/pull/2096#issuecomment-774773226
Works fine for me so far, tested a few operations among a few clusters of different versions. One behavior I see is that, if an old v1 writer is somehow still used to create a new table version after v2 writer is used, all schema history will be dropped, so new schema written by v2 writer will have schema id starting from 0 again, which can break time travel query. In theory, once a table is written by a v2 writer, it should never be touched by a v1 writer, but we cannot prevent people from doing that in real life, especially when the organization is large and people use different versions of the same software all the time. So I feel we still need something like a `last-assigned-schema-id`, so that the schemas don't accidentally share the same id and we can backfill the earlier schema and their IDs if necessary. But `last-assigned-schema-id` field should not as a part of the table metadata field because it will be dropped by an old writer. One way to go is to maybe put it in table properties, but there might be better ways. What do you think? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
