[GitHub] [iceberg] RussellSpitzer commented on a change in pull request #1344: Expire snapshots action without cache

GitBox Wed, 19 Aug 2020 20:33:11 -0700


RussellSpitzer commented on a change in pull request #1344:
URL: https://github.com/apache/iceberg/pull/1344#discussion_r473567025




##########
File path: 
spark/src/main/java/org/apache/iceberg/actions/ExpireSnapshotsAction.java
##########
@@ -147,49 +149,41 @@ public ExpireSnapshotsAction deleteWith(Consumer<String> 
newDeleteFunc) {
 
   @Override
   public ExpireSnapshotsActionResult execute() {
-    Dataset<Row> originalFiles = null;
-    try {
-      // Metadata before Expiration
-      originalFiles = buildValidFileDF().persist();
-      // Action to trigger persist
-      originalFiles.count();
-
-      // Perform Expiration
-      ExpireSnapshots expireSnaps = 
table.expireSnapshots().cleanExpiredFiles(false);
-      for (final Long id : expireSnapshotIdValues) {
-        expireSnaps = expireSnaps.expireSnapshotId(id);
-      }
-
-      if (expireOlderThanValue != null) {
-        expireSnaps = expireSnaps.expireOlderThan(expireOlderThanValue);
-      }
-
-      if (retainLastValue != null) {
-        expireSnaps = expireSnaps.retainLast(retainLastValue);
-      }
-
-      expireSnaps.commit();
-
-      // Metadata after Expiration
-      Dataset<Row> validFiles = buildValidFileDF();
-      Dataset<Row> filesToDelete = originalFiles.except(validFiles);
-
-      return deleteFiles(filesToDelete.toLocalIterator());
-    } finally {
-      if (originalFiles != null) {
-        originalFiles.unpersist();
-      }
+    // Metadata before Expiration
+    Dataset<Row>  originalFiles = buildValidFileDF(ops.current());
+
+    // Perform Expiration
+    ExpireSnapshots expireSnaps = 
table.expireSnapshots().cleanExpiredFiles(false);
+    for (final Long id : expireSnapshotIdValues) {
+      expireSnaps = expireSnaps.expireSnapshotId(id);
+    }
+
+    if (expireOlderThanValue != null) {
+      expireSnaps = expireSnaps.expireOlderThan(expireOlderThanValue);
+    }
+
+    if (retainLastValue != null) {
+      expireSnaps = expireSnaps.retainLast(retainLastValue);
     }
+
+    expireSnaps.commit();
+
+    // Metadata after Expiration
+    Dataset<Row> validFiles = buildValidFileDF(ops.refresh());
+    Dataset<Row> filesToDelete = originalFiles.except(validFiles);
+
+    return deleteFiles(filesToDelete.toLocalIterator());
   }
 
   private Dataset<Row> appendTypeString(Dataset<Row> ds, String type) {
     return ds.select(new Column("file_path"), 
functions.lit(type).as("file_type"));
   }
 
-  private Dataset<Row> buildValidFileDF() {
-    return appendTypeString(buildValidDataFileDF(spark), DATA_FILE)
-        .union(appendTypeString(buildManifestFileDF(spark), MANIFEST))
-        .union(appendTypeString(buildManifestListDF(spark, table), 
MANIFEST_LIST));
+  private Dataset<Row> buildValidFileDF(TableMetadata metadata) {
+    StaticTableOperations staticOps = new 
StaticTableOperations(metadata.metadataFileLocation(), table.io());

Review comment:
       I think I understand what you are asking for here, but I'm not sure I 
like how it looks since I end up with two methods, one of which takes 
metadataFileLocation and one which takes "Table"
   
   The metadataFileLocation version makes the StaticOps and BaseTable from them 
and passes to the table method,
   while the "table" method is used for the version used by Orphan files.
   
   Take a look  at the new version and see if we are on the same page




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] RussellSpitzer commented on a change in pull request #1344: Expire snapshots action without cache

Reply via email to