aokolnychyi commented on code in PR #5469:
URL: https://github.com/apache/iceberg/pull/5469#discussion_r941924589
##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/actions/ExpireSnapshotsSparkAction.java:
##########
@@ -140,19 +142,39 @@ public ExpireSnapshotsSparkAction
deleteWith(Consumer<String> newDeleteFunc) {
*
* <p>This does not delete data files. To delete data files, run {@link
#execute()}.
*
- * <p>This may be called before or after {@link #execute()} is called to
return the expired file
- * list.
+ * <p>This may be called before or after {@link #execute()} to return the
expired files.
*
* @return a Dataset of files that are no longer referenced by the table
+ * @deprecated since 1.0.0, will be removed in 1.1.0; use {@link
#expireFiles()} instead.
*/
+ @Deprecated
public Dataset<Row> expire() {
- if (expiredFiles == null) {
+ // rely on the same query execution to reuse shuffles
+ QueryExecution queryExecution = expiredFileDS().queryExecution();
+ return new Dataset<>(queryExecution,
RowEncoder.apply(queryExecution.analyzed().schema()));
+ }
+
+ /**
+ * Expires snapshots and commits the changes to the table, returning a
Dataset of files to delete.
+ *
+ * <p>This does not delete data files. To delete data files, run {@link
#execute()}.
+ *
+ * <p>This may be called before or after {@link #execute()} to return the
expired files.
Review Comment:
Well, I had the same question when I started looking. I thought you either
call `execute` or `expire`. Then I saw the following comment in Javadoc and
related tests.
```
This may be called before or after {@link #execute()} to return the expired
files.
```
It turns out one could call `expire` after `execute` to fetch the list of
expired files and it magically worked cause Spark was reusing shuffle data. I
had to keep the old behavior. I just changed the type of the instance variable
that used to cache the data frame.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]