szehon-ho opened a new pull request, #7389: URL: https://github.com/apache/iceberg/pull/7389
This implements the RewritePositionDeleteFiles Interface (already existing) with a Spark action This action compacting/splitting position delete files, based on input parameters. Most of the logic is re-used from RewriteDataFiles, via new Rewriter classes added in #7175 . The additional logic here is sorting position deletes locally by 'file_path' and 'pos', as defined in Iceberg spec. This action will also notably remove 'dangling deletes', ie remove position deletes that no longer have a live data file. Previously this was not possible in any Iceberg action. This is implemented via a left semi-join on 'data_files' table. Remaining items: filter() is not yet supported. As the position deletes rewrite is done against the position_deletes metadata table, the filter of data table does not apply. Some work is needed to transform this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
