openinx commented on issue #2308:
URL: https://github.com/apache/iceberg/issues/2308#issuecomment-801576874


   @stevenzwu  I will publish the PR about seqNum changes today, it's a very 
simple change. 
   
   @yyanyy 
   > I think it's a reasonable approach to mark the rewritten files with the 
old seqNum, but I'm not sure if we necessarily need to use an old sequence ....
   
   Yeah,  it will write the manifest/data/delete files with the old seqNum, but 
the newly created snapshot will use an increased seqNum because we have the 
validation that seqNum MUST be increasing for snapshots.
   
   > since it's possible that there could be separate concurrent jobs that do 
rewrites from equality to positional deletes?
   
   For the rewrite action that converting equality to position deletes,  we 
could still use the same approach, say commit the rewrite txn with old seqNum. 
Because those newly produced pos-deletes applies to the files that does not 
change.  The position-delete issue that we discussed before is:   
   
   Seq1: (RowDelta 1)
   INSERT, <1, A>
   INSERT, <2, B>
   DELETE, <1, A>
   
   Seq2: (RowDelta 2)
   POS-DELETE, <2, B> <---- here is a position deletion
   
   Seq3: (Rewrite)
   INSERT, <2,B>
   
   The committed pos-delete could not mask the newly generated files that 
committed in later rewrite action. This case we have to retry the whole rewrite 
process. 
   
   > using sequence number of the files to determine if a certain index file is 
current, or outdated. 
   
   I am sorry that I was absent from the meeting I promised to attend. I 
misunderstood it as 1pm Beijing time (The correct time is 12am~1am Beijing 
actually).   Is possible to combine the snapshot seqNum with file's seqNum to 
solve this problem ?  Actually,  I did not fully understand this background.
   
   
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to