RussellSpitzer commented on a change in pull request #1344:
URL: https://github.com/apache/iceberg/pull/1344#discussion_r473567025
##########
File path:
spark/src/main/java/org/apache/iceberg/actions/ExpireSnapshotsAction.java
##########
@@ -147,49 +149,41 @@ public ExpireSnapshotsAction deleteWith(Consumer<String>
newDeleteFunc) {
@Override
public ExpireSnapshotsActionResult execute() {
- Dataset<Row> originalFiles = null;
- try {
- // Metadata before Expiration
- originalFiles = buildValidFileDF().persist();
- // Action to trigger persist
- originalFiles.count();
-
- // Perform Expiration
- ExpireSnapshots expireSnaps =
table.expireSnapshots().cleanExpiredFiles(false);
- for (final Long id : expireSnapshotIdValues) {
- expireSnaps = expireSnaps.expireSnapshotId(id);
- }
-
- if (expireOlderThanValue != null) {
- expireSnaps = expireSnaps.expireOlderThan(expireOlderThanValue);
- }
-
- if (retainLastValue != null) {
- expireSnaps = expireSnaps.retainLast(retainLastValue);
- }
-
- expireSnaps.commit();
-
- // Metadata after Expiration
- Dataset<Row> validFiles = buildValidFileDF();
- Dataset<Row> filesToDelete = originalFiles.except(validFiles);
-
- return deleteFiles(filesToDelete.toLocalIterator());
- } finally {
- if (originalFiles != null) {
- originalFiles.unpersist();
- }
+ // Metadata before Expiration
+ Dataset<Row> originalFiles = buildValidFileDF(ops.current());
+
+ // Perform Expiration
+ ExpireSnapshots expireSnaps =
table.expireSnapshots().cleanExpiredFiles(false);
+ for (final Long id : expireSnapshotIdValues) {
+ expireSnaps = expireSnaps.expireSnapshotId(id);
+ }
+
+ if (expireOlderThanValue != null) {
+ expireSnaps = expireSnaps.expireOlderThan(expireOlderThanValue);
+ }
+
+ if (retainLastValue != null) {
+ expireSnaps = expireSnaps.retainLast(retainLastValue);
}
+
+ expireSnaps.commit();
+
+ // Metadata after Expiration
+ Dataset<Row> validFiles = buildValidFileDF(ops.refresh());
+ Dataset<Row> filesToDelete = originalFiles.except(validFiles);
+
+ return deleteFiles(filesToDelete.toLocalIterator());
}
private Dataset<Row> appendTypeString(Dataset<Row> ds, String type) {
return ds.select(new Column("file_path"),
functions.lit(type).as("file_type"));
}
- private Dataset<Row> buildValidFileDF() {
- return appendTypeString(buildValidDataFileDF(spark), DATA_FILE)
- .union(appendTypeString(buildManifestFileDF(spark), MANIFEST))
- .union(appendTypeString(buildManifestListDF(spark, table),
MANIFEST_LIST));
+ private Dataset<Row> buildValidFileDF(TableMetadata metadata) {
+ StaticTableOperations staticOps = new
StaticTableOperations(metadata.metadataFileLocation(), table.io());
Review comment:
I think I understand what you are asking for here, but I'm not sure I
like how it looks since I end up with two methods, one of which takes
metadataFileLocation and one which takes "Table"
The metadataFileLocation version makes the StaticOps and BaseTable from them
and passes to the table method,
while the "table" method is used for the version used by Orphan files.
Take a look at the new version and see if we are on the same page
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]