rdblue commented on issue #2308: URL: https://github.com/apache/iceberg/issues/2308#issuecomment-844349495
I just read through this so I think I'm caught up. Sounds like we are discussing two problems, one short term and one long term. The short term issue is that we have a correctness problem when a row delete is committed while a rewrite is happening. I think we should fix this problem right away using the approach that @stevenzwu suggested: rewrite should validate that there are no new delete files for the partitions that are rewritten. If there are new delete files, then the rewrite should fail. This will prevent the rewrite from succeeding right now, but I think it is important to fix correctness issues as soon as possible. The long-term issue is how to modify the rewrite operation so that we can avoid the conflict with new delete files. The approach discussed above is what we originally intended to use when we were designing the v2 metadata: a rewrite should track data files with an _older_ sequence number so that new delete files are applied correctly. That works for equality deletes, but not for position deletes. Since we're primarily concerned about streaming use cases, I think that is okay. We can continue to fail the rewrite if new position deletes are added. Also, we can't update the sequence number of the new delete files. We have to change the sequence number of the rewritten data files. That's because there may be other data files that the new delete files would apply to if the delete file's sequence number was increased. For example, if the delete were an upsert instead -- a delete and an insert for the same row ID -- then moving the delete forward would cause Iceberg to drop the inserted replacement row. @yyanyy brings up a good point about how this limits the use of sequence numbers in other cases. I think that's okay. It is better to allow the table to be compacted. We'll figure out that problem as we design the secondary metadata. I think it would be okay because the new files would just not be covered by the secondary index and would need to be scanned. So it wouldn't introduce a correctness issue, just some potential staleness. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
