[GitHub] [iceberg] jackye1995 commented on pull request #2096: Core: add schema id and schemas to table metadata

GitBox Sun, 07 Feb 2021 13:43:08 -0800


jackye1995 commented on pull request #2096:
URL: https://github.com/apache/iceberg/pull/2096#issuecomment-774773226



   Works fine for me so far, tested a few operations among a few clusters of 
different versions. 
   
   One behavior I see is that, if an old v1 writer is somehow still used to 
create a new table version after v2 writer is used, all schema history will be 
dropped, so new schema written by v2 writer will have schema id starting from 0 
again, which can break time travel query. 
   
   In theory, once a table is written by a v2 writer, it should never be 
touched by a v1 writer, but we cannot prevent people from doing that in real 
life, especially when the organization is large and people use different 
versions of the same software all the time. So I feel we still need something 
like a `last-assigned-schema-id`, so that the schemas don't accidentally share 
the same id and we can backfill the earlier schema and their IDs if necessary. 
But `last-assigned-schema-id` field should not as a part of the table metadata 
field because it will be dropped by an old writer. One way to go is to maybe 
put it in table properties, but there might be better ways. What do you think?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] jackye1995 commented on pull request #2096: Core: add schema id and schemas to table metadata

Reply via email to