[GitHub] [iceberg] chenjunjiedada commented on pull request #5760: Core: Add minimum data sequence number to ManifestEntry

via GitHub Fri, 28 Apr 2023 17:32:45 -0700


chenjunjiedada commented on PR #5760:
URL: https://github.com/apache/iceberg/pull/5760#issuecomment-1528310117


   > Did I get it correctly? It only applies to position deletes?
   
   Correct.
   
   >My primary worry is that this would require a spec change and quite a bit 
of code to populate the new value. For instance, we currently only track file 
names when writing position deletes. After this, we would have to project and 
keep track of the sequence number per each referenced data file. Even after all 
of that, we can still get false positives.
   
   Yes,  it does need a field as added in this PR.  It may need to track the 
sequence number of reference data files in spark MoR mode since we could 
populate the null value at first and populate the correct value in a later 
rewrite action. But anyway, it does get a false positive as you mentioned. 
While in the Flink upsert case, it always has a lazy value and thus no false 
positive problem. 
   
   >I am currently working on an alternative planning for position deletes in 
Spark, where I want to open files in a distributed manner and squash them into 
a bitmap per data file. This would give us a reliable way to check if delete 
files apply and would also avoid the need to open the same delete file multiple 
times for different data files.
   
   Sounds cool and promising, look forward to it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] chenjunjiedada commented on pull request #5760: Core: Add minimum data sequence number to ManifestEntry

Reply via email to