stevenzwu commented on code in PR #14287:
URL: https://github.com/apache/iceberg/pull/14287#discussion_r2449197516


##########
api/src/main/java/org/apache/iceberg/ExpireSnapshots.java:
##########
@@ -116,9 +116,39 @@ public interface ExpireSnapshots extends 
PendingUpdate<List<Snapshot>> {
    *
    * @param clean setting this to false will skip deleting expired manifests 
and files
    * @return this for method chaining
+   * @deprecated since 1.10.0, will be removed in 2.0.0; use {@link 
#cleanMode(CleanupMode)}
+   *     instead.
    */
+  @Deprecated
   ExpireSnapshots cleanExpiredFiles(boolean clean);
 
+  /**
+   * Configures the cleanup mode for expired files.
+   *
+   * <p>This method provides fine-grained control over which files are cleaned 
up during snapshot
+   * expiration. The cleanup modes are:
+   *
+   * <ul>

Review Comment:
   nit: add these as Javadoc to CleanupMode enum



##########
api/src/main/java/org/apache/iceberg/ExpireSnapshots.java:
##########
@@ -116,9 +116,39 @@ public interface ExpireSnapshots extends 
PendingUpdate<List<Snapshot>> {
    *
    * @param clean setting this to false will skip deleting expired manifests 
and files
    * @return this for method chaining
+   * @deprecated since 1.10.0, will be removed in 2.0.0; use {@link 
#cleanMode(CleanupMode)}
+   *     instead.
    */
+  @Deprecated
   ExpireSnapshots cleanExpiredFiles(boolean clean);
 
+  /**
+   * Configures the cleanup mode for expired files.
+   *
+   * <p>This method provides fine-grained control over which files are cleaned 
up during snapshot
+   * expiration. The cleanup modes are:
+   *
+   * <ul>
+   *   <li>{@link CleanupMode#ALL} - Clean up both metadata and data files 
(default)
+   *   <li>{@link CleanupMode#METADATA_ONLY} - Clean up only metadata files 
(manifests, manifest
+   *       lists), retain data files
+   *   <li>{@link CleanupMode#NONE} - Skip all file cleanup, only remove 
snapshot metadata
+   * </ul>
+   *
+   * <p>consider METADATA_ONLY mode when data files are shared across tables 
or when using
+   * procedures like add-files that may reference the same data files.
+   *
+   * <p>consider NONE mode when data and manifest files may be more 
efficiently removed using a
+   * distributed framework through the actions API
+   *
+   * @param mode the cleanup mode to use for expired snapshots
+   * @return this for method chaining
+   */
+  default ExpireSnapshots cleanMode(CleanupMode mode) {

Review Comment:
   unresolve this comment as I didn't see @nastra 's suggestion was applied



##########
core/src/main/java/org/apache/iceberg/IncrementalFileCleanup.java:
##########
@@ -251,11 +252,15 @@ public void cleanFiles(TableMetadata beforeExpiration, 
TableMetadata afterExpira
               }
             });
 
-    Set<String> filesToDelete =
-        findFilesToDelete(
-            manifestsToScan, manifestsToRevert, validIds, 
beforeExpiration.specsById());
+    if (CleanupMode.ALL == cleanupMode) {
+      Set<String> filesToDelete =
+          findFilesToDelete(
+              manifestsToScan, manifestsToRevert, validIds, 
beforeExpiration.specsById());
+      LOG.debug("Deleting {} data files", filesToDelete.size());
+      deleteFiles(filesToDelete, "data");
+    }
 
-    deleteFiles(filesToDelete, "data");
+    LOG.debug("Deleting {} manifest files", manifestsToDelete.size());
     deleteFiles(manifestsToDelete, "manifest");

Review Comment:
   I can understand @nastra 's question. it is not easy to connect the dot. 
   
   A couple of things might help
   
   1. add a `Preconditions.checkArgument` in the constructor of 
`IncrementalFileCleanup` and `ReachableFileCleanup`
   2. add a code comment here to briefly explain that this code path already 
requires cleanup (metadata_only or all)



##########
core/src/main/java/org/apache/iceberg/RemoveSnapshots.java:
##########
@@ -103,7 +104,12 @@ class RemoveSnapshots implements ExpireSnapshots {
 
   @Override
   public ExpireSnapshots cleanExpiredFiles(boolean clean) {
-    this.cleanExpiredFiles = clean;
+    LOG.warn("cleanExpiredFiles(boolean) is deprecated. Use 
cleanMode(CleanupMode) instead.");
+    Preconditions.checkArgument(

Review Comment:
   it seems ok to throw an exception here because we don't want to users to 
call both setters. only one should be used.



##########
api/src/main/java/org/apache/iceberg/CleanupMode.java:
##########
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg;
+
+/** An enum representing possible clean up mode used in snapshot expiration. */
+public enum CleanupMode {
+  NONE(0),
+  METADATA_ONLY(1),
+  ALL(2);
+
+  CleanupMode(int id) {
+    this.id = id;
+  }
+
+  private final int id;
+
+  public int id() {
+    return id;
+  }
+
+  public boolean requireCleanup() {

Review Comment:
   unresolve this comment as I didn't see @nastra 's suggestion was applied



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to