aokolnychyi commented on code in PR #5469:
URL: https://github.com/apache/iceberg/pull/5469#discussion_r941924589


##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/actions/ExpireSnapshotsSparkAction.java:
##########
@@ -140,19 +142,39 @@ public ExpireSnapshotsSparkAction 
deleteWith(Consumer<String> newDeleteFunc) {
    *
    * <p>This does not delete data files. To delete data files, run {@link 
#execute()}.
    *
-   * <p>This may be called before or after {@link #execute()} is called to 
return the expired file
-   * list.
+   * <p>This may be called before or after {@link #execute()} to return the 
expired files.
    *
    * @return a Dataset of files that are no longer referenced by the table
+   * @deprecated since 1.0.0, will be removed in 1.1.0; use {@link 
#expireFiles()} instead.
    */
+  @Deprecated
   public Dataset<Row> expire() {
-    if (expiredFiles == null) {
+    // rely on the same query execution to reuse shuffles
+    QueryExecution queryExecution = expiredFileDS().queryExecution();
+    return new Dataset<>(queryExecution, 
RowEncoder.apply(queryExecution.analyzed().schema()));
+  }
+
+  /**
+   * Expires snapshots and commits the changes to the table, returning a 
Dataset of files to delete.
+   *
+   * <p>This does not delete data files. To delete data files, run {@link 
#execute()}.
+   *
+   * <p>This may be called before or after {@link #execute()} to return the 
expired files.

Review Comment:
   Well, I had the same question when I started looking. I thought you either 
call `execute` or `expire`. Then I saw the following comment in Javadoc and 
related tests.
   
   ```
   This may be called before or after {@link #execute()} to return the expired 
files.
   ```
   
   It turns out one could call `expire` after `execute` to fetch the list of 
expired files and it magically worked cause Spark was reusing shuffle data. I 
had to keep the old behavior. I just changed the type of the instance variable 
that used to cache the data frame.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to