RussellSpitzer commented on issue #3496: URL: https://github.com/apache/iceberg/issues/3496#issuecomment-963156998
The default behavior of the Action is to remove the files which are no longer live. The difference between the table api and the action, is that the Action determines the set of files to be removed by using a distributed job while the table api does this calculation locally. The main reason behind writing the action was that the default api does not scale well for extremely large tables. To preserve the functionality of the original api we added the flag which causes the original api to just remove snapshots and not delete files. The Action then takes the difference in state between before and after running the api, and uses that information to delete the files. This is explained in the Java Doc for the Class https://github.com/apache/iceberg/blob/9b285d049c094ca6ee717e159249dee36d118894/spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/actions/BaseExpireSnapshotsSparkAction.java#L54-L67 Here you can see the delete function is applied on the diff set result https://github.com/apache/iceberg/blob/9b285d049c094ca6ee717e159249dee36d118894/spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/actions/BaseExpireSnapshotsSparkAction.java#L214-L221 The default delete action removes the files see https://github.com/apache/iceberg/blob/9b285d049c094ca6ee717e159249dee36d118894/spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/actions/BaseExpireSnapshotsSparkAction.java#L82-L89 --- So I do not believe any changes are needed here. If a user did want to use a separate delete facility I would actually suggest they use https://github.com/apache/iceberg/blob/9b285d049c094ca6ee717e159249dee36d118894/spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/actions/BaseExpireSnapshotsSparkAction.java#L154 which was added explicitly for users who have some sort of distributed async delete solution -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
