openinx commented on issue #2308: URL: https://github.com/apache/iceberg/issues/2308#issuecomment-801576874
@stevenzwu I will publish the PR about seqNum changes today, it's a very simple change. @yyanyy > I think it's a reasonable approach to mark the rewritten files with the old seqNum, but I'm not sure if we necessarily need to use an old sequence .... Yeah, it will write the manifest/data/delete files with the old seqNum, but the newly created snapshot will use an increased seqNum because we have the validation that seqNum MUST be increasing for snapshots. > since it's possible that there could be separate concurrent jobs that do rewrites from equality to positional deletes? For the rewrite action that converting equality to position deletes, we could still use the same approach, say commit the rewrite txn with old seqNum. Because those newly produced pos-deletes applies to the files that does not change. The position-delete issue that we discussed before is: Seq1: (RowDelta 1) INSERT, <1, A> INSERT, <2, B> DELETE, <1, A> Seq2: (RowDelta 2) POS-DELETE, <2, B> <---- here is a position deletion Seq3: (Rewrite) INSERT, <2,B> The committed pos-delete could not mask the newly generated files that committed in later rewrite action. This case we have to retry the whole rewrite process. > using sequence number of the files to determine if a certain index file is current, or outdated. I am sorry that I was absent from the meeting I promised to attend. I misunderstood it as 1pm Beijing time (The correct time is 12am~1am Beijing actually). Is possible to combine the snapshot seqNum with file's seqNum to solve this problem ? Actually, I did not fully understand this background. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
