chenjunjiedada opened a new pull request #2216:
URL: https://github.com/apache/iceberg/pull/2216


   This adds a spark action to replace the equality deletes to position deletes 
which I think is minor compaction. The logic is:
   
   1. Plan and group the tasks by partition. Current it doesn't consider the 
filter, we may consider filter, such as partition filter, later.
   2. Use the delete matcher to keep rows that match the equality delete set. 
The rows are projected with file and pos fields.
   3. Write the matched rows via position delete writer.
   4. Perform the rewrite files to replace equality deletes with position 
deletes.
   
   This adds an API in `RewriteFiles` to rewrite equality deletes to position 
deletes. It should keep the same semantic with current API that rows must be 
the same as before as after. This could be used to combine position deletes to 
reduce some small files.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to