chenjunjiedada opened a new pull request #2364:
URL: https://github.com/apache/iceberg/pull/2364


   This is a sub-PR of #2216, it adds a spark action to replace the equality 
deletes to position deletes which I think is minor compaction. The logic is:
   
   1. Plan and group the tasks by partition. Current it doesn't consider the 
filter, we may consider filter, such as partition filter, later.
   2. Use the delete matcher to keep rows that match the equality delete set. 
The rows are projected with file and pos fields.
   3. Write the matched rows via position delete writer.
   4. Perform the rewrite files to replace equality deletes with position 
deletes.
   
   This adds an API in RewriteFiles to rewrite equality deletes to position 
deletes. It should keep the same semantic with the current API that rows must 
be the same as before as after. This could be used to combine position deletes 
to reduce some small files.
   
   This may need some changes when https://github.com/apache/iceberg/pull/2294 
get merged.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to