chenjunjiedada opened a new pull request #2216: URL: https://github.com/apache/iceberg/pull/2216
This adds a spark action to replace the equality deletes to position deletes which I think is minor compaction. The logic is: 1. Plan and group the tasks by partition. Current it doesn't consider the filter, we may consider filter, such as partition filter, later. 2. Use the delete matcher to keep rows that match the equality delete set. The rows are projected with file and pos fields. 3. Write the matched rows via position delete writer. 4. Perform the rewrite files to replace equality deletes with position deletes. This adds an API in `RewriteFiles` to rewrite equality deletes to position deletes. It should keep the same semantic with current API that rows must be the same as before as after. This could be used to combine position deletes to reduce some small files. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
