rdblue commented on pull request #3204:
URL: https://github.com/apache/iceberg/pull/3204#issuecomment-951054059


   I've been thinking about this case and I think that the right way to do this 
is to set the sequence number on individual files rather than at the snapshot 
level. I don't think that we should change the sequence number of the snapshot 
or manifest list. We should just set the sequence number of individual data 
files. Basically, I agree with @yyanyy's 
[comment](https://github.com/apache/iceberg/issues/2308#issuecomment-801330831):
   
   > I think there are two seqNum concepts here: seqNum for the table/commit 
and seqNum for the file. I think it's a reasonable approach to mark the 
rewritten files with the old seqNum, but I'm not sure if we necessarily need to 
use an old sequence number for the commit since they are stored as part of the 
snapshot, and suddenly have an old seqNum within the snapshot could be 
confusing even if there's no other implication.
   
   I think we need to set the file sequence numbers. That raises the question: 
what do we set them to? Ideally, we would use the latest, but the delete file 
commit has claimed that number so we need to go with the sequence number that 
is less than any other commits. That would be the sequence number that was 
current when rewrite operation started. It would be nice to find a way around 
reusing a sequence number from a different snapshot, but I don't see a good way 
to do that right now. We can possibly fix that up later by skipping sequence 
numbers.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to