yyanyy commented on issue #2308:
URL: https://github.com/apache/iceberg/issues/2308#issuecomment-801330831


   I think there are two seqNum concepts here: seqNum for the table/commit and 
seqNum for the file. I think it's a reasonable approach to mark the rewritten 
files with the old seqNum, but I'm not sure if we necessarily need to use an 
old sequence number for the commit since they are stored as part of the 
snapshot, and suddenly have an old seqNum within the snapshot could be 
confusing even if there's no other implication.
   
   > the equality write path don't introduce any position delete files that 
will be applied to the old files. So we don't have to fallback to re-do the 
whole rewrite action if conflicts happen
   
   Do we want to skip the check on commit conflicts entirely? I think we always 
want to do the check but just to not fail the commit unless there's positional 
deletes even for streaming systems, since it's possible that there could be 
separate concurrent jobs that do rewrites from equality to positional deletes? 
   
   Another thing that's worth noting is that there's a meeting about designing 
secondary index earlier today with the community, and when we discussed about 
rolling up secondary indexes to partition/global level (anything higher than 
file level index file), Anton @aokolnychyi brought up a point about using 
sequence number of the files to determine if a certain index file is current, 
or outdated. If we implement the change mentioned in this issue's proposal, we 
couldn't achieve that, since the data file path will be different after 
rewritten but the sequence number stays the same, and the system that builds 
index won't know that it has to update the index file which eventually will 
cause a discrepancy.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to