yyanyy commented on pull request #2096:
URL: https://github.com/apache/iceberg/pull/2096#issuecomment-775597041


   > Works fine for me so far, tested a few operations among a few clusters of 
different versions.
   > 
   > One behavior I see is that, if an old v1 writer is somehow still used to 
create a new table version after v2 writer is used, all schema history will be 
dropped, so new schema written by v2 writer will have schema id starting from 0 
again, which can break time travel query.
   > 
   > In theory, once a table is written by a v2 writer, it should never be 
touched by a v1 writer, but we cannot prevent people from doing that in real 
life, especially when the organization is large and people use different 
versions of the same software all the time. So I feel we still need something 
like a `last-assigned-schema-id`, so that the schemas don't accidentally share 
the same id and we can backfill the earlier schema and their IDs if necessary. 
But `last-assigned-schema-id` field should not as a part of the table metadata 
field because it will be dropped by an old writer. One way to go is to maybe 
put it in table properties, but there might be better ways. What do you think?
   
   Thanks for putting efforts into testing this! 
   
   I think the problem you described seem to be similar to what I had in my 
question 1, and I think we might be fine with that since the IDs within the 
metadata file should always be consistent, and we don't expose them in metadata 
table as Ryan mentioned above. I think it wouldn't break time travel query, as 
after the old writer writes the data, the schema ID with the snapshot got 
dropped, in which case we will likely need to rely on #1508 for time travel? 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to