rdblue commented on pull request #3204: URL: https://github.com/apache/iceberg/pull/3204#issuecomment-951054059
I've been thinking about this case and I think that the right way to do this is to set the sequence number on individual files rather than at the snapshot level. I don't think that we should change the sequence number of the snapshot or manifest list. We should just set the sequence number of individual data files. Basically, I agree with @yyanyy's [comment](https://github.com/apache/iceberg/issues/2308#issuecomment-801330831): > I think there are two seqNum concepts here: seqNum for the table/commit and seqNum for the file. I think it's a reasonable approach to mark the rewritten files with the old seqNum, but I'm not sure if we necessarily need to use an old sequence number for the commit since they are stored as part of the snapshot, and suddenly have an old seqNum within the snapshot could be confusing even if there's no other implication. I think we need to set the file sequence numbers. That raises the question: what do we set them to? Ideally, we would use the latest, but the delete file commit has claimed that number so we need to go with the sequence number that is less than any other commits. That would be the sequence number that was current when rewrite operation started. It would be nice to find a way around reusing a sequence number from a different snapshot, but I don't see a good way to do that right now. We can possibly fix that up later by skipping sequence numbers. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
