szehon-ho opened a new pull request, #7389:
URL: https://github.com/apache/iceberg/pull/7389

   This implements the RewritePositionDeleteFiles Interface (already existing) 
with a Spark action
   
   This action compacting/splitting position delete files, based on input 
parameters.  Most of the logic is re-used from RewriteDataFiles, via new 
Rewriter classes added in #7175 .  The additional logic here is sorting 
position deletes locally by 'file_path' and 'pos', as defined in Iceberg spec.
   
   This action will also notably remove 'dangling deletes', ie remove position 
deletes that no longer have a live data file.  Previously this was not possible 
in any Iceberg action.  This is implemented via a left semi-join on 
'data_files' table.
   
   Remaining items: filter() is not yet supported.  As the position deletes 
rewrite is done against the position_deletes metadata table, the filter of data 
table does not apply.  Some work is needed to transform this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to