yyanyy commented on issue #2308: URL: https://github.com/apache/iceberg/issues/2308#issuecomment-801330831
I think there are two seqNum concepts here: seqNum for the table/commit and seqNum for the file. I think it's a reasonable approach to mark the rewritten files with the old seqNum, but I'm not sure if we necessarily need to use an old sequence number for the commit since they are stored as part of the snapshot, and suddenly have an old seqNum within the snapshot could be confusing even if there's no other implication. > the equality write path don't introduce any position delete files that will be applied to the old files. So we don't have to fallback to re-do the whole rewrite action if conflicts happen Do we want to skip the check on commit conflicts entirely? I think we always want to do the check but just to not fail the commit unless there's positional deletes even for streaming systems, since it's possible that there could be separate concurrent jobs that do rewrites from equality to positional deletes? Another thing that's worth noting is that there's a meeting about designing secondary index earlier today with the community, and when we discussed about rolling up secondary indexes to partition/global level (anything higher than file level index file), Anton @aokolnychyi brought up a point about using sequence number of the files to determine if a certain index file is current, or outdated. If we implement the change mentioned in this issue's proposal, we couldn't achieve that, since the data file path will be different after rewritten but the sequence number stays the same, and the system that builds index won't know that it has to update the index file which eventually will cause a discrepancy. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
