Re: [PR] Spark: add rest.catalog-purge property to delegate DROP TABLE PURGE to REST catalogs [iceberg]

via GitHub Fri, 29 May 2026 00:35:24 -0700


felixschneider99 commented on code in PR #15614:
URL: https://github.com/apache/iceberg/pull/15614#discussion_r3322836574



##########
spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/TestRestDropPurgeTable.java:
##########
@@ -0,0 +1,118 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.spark;
+
+import static org.apache.iceberg.rest.RESTCatalogProperties.REST_CATALOG_PURGE;
+import static org.assertj.core.api.Assertions.assertThatThrownBy;
+import static org.mockito.ArgumentMatchers.any;
+import static org.mockito.ArgumentMatchers.anyBoolean;
+import static org.mockito.ArgumentMatchers.eq;
+import static org.mockito.Mockito.mock;
+import static org.mockito.Mockito.verify;
+import static org.mockito.Mockito.when;
+
+import java.io.File;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.iceberg.PartitionSpec;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.Table;
+import org.apache.iceberg.catalog.Catalog;
+import org.apache.iceberg.hadoop.HadoopTables;
+import org.apache.iceberg.inmemory.InMemoryCatalog;
+import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap;
+import org.apache.iceberg.rest.RESTCatalog;
+import org.apache.iceberg.types.Types;
+import org.apache.spark.sql.connector.catalog.Identifier;
+import org.apache.spark.sql.util.CaseInsensitiveStringMap;
+import org.junit.jupiter.api.Test;
+import org.junit.jupiter.api.io.TempDir;
+
+/**
+ * Tests for the REST catalog purge delegation feature in {@link SparkCatalog}.
+ *
+ * <p>Verifies that {@link 
org.apache.iceberg.rest.RESTCatalogProperties#REST_CATALOG_PURGE}
+ * controls whether Spark delegates DROP TABLE PURGE to the REST catalog or 
performs client-side
+ * file deletion.
+ */
+public class TestRestDropPurgeTable extends TestBase {
+
+  private static final Schema SCHEMA =
+      new Schema(Types.NestedField.required(1, "id", Types.LongType.get()));
+
+  private static final Identifier SPARK_ID = Identifier.of(new String[] 
{"ns"}, "test_table");
+
+  @TempDir private File tableDir;
+
+  private RESTCatalog restCatalogSpy;
+
+  private SparkCatalog createCatalog(boolean catalogPurge) {
+    Table table =
+        new HadoopTables(new Configuration())
+            .create(SCHEMA, PartitionSpec.unpartitioned(), 
tableDir.getAbsolutePath());
+
+    restCatalogSpy = mock(RESTCatalog.class);
+    when(restCatalogSpy.loadTable(any())).thenReturn(table);
+    when(restCatalogSpy.dropTable(any(), anyBoolean())).thenReturn(true);
+
+    SparkCatalog catalog =
+        new SparkCatalog() {
+          @Override
+          protected Catalog buildIcebergCatalog(String name, 
CaseInsensitiveStringMap options) {
+            return restCatalogSpy;
+          }
+        };
+    catalog.initialize(
+        "test_catalog",
+        new CaseInsensitiveStringMap(
+            ImmutableMap.of(REST_CATALOG_PURGE, 
Boolean.toString(catalogPurge))));
+    return catalog;
+  }
+
+  @Test
+  void purgeTableDelegatesToCatalogWhenEnabled() {
+    SparkCatalog catalog = createCatalog(true);
+    catalog.purgeTable(SPARK_ID);
+    verify(restCatalogSpy).dropTable(any(), eq(true));
+  }
+
+  @Test
+  void purgeTableDoesClientSidePurgeWhenDisabled() {
+    SparkCatalog catalog = createCatalog(false);
+    catalog.purgeTable(SPARK_ID);
+    verify(restCatalogSpy).dropTable(any(), eq(false));
+  }
+
+  @Test
+  void initializationFailsWhenPurgeEnabledWithNonRestCatalog() {
+    SparkCatalog catalog =
+        new SparkCatalog() {
+          @Override
+          protected Catalog buildIcebergCatalog(String name, 
CaseInsensitiveStringMap options) {
+            return new InMemoryCatalog();
+          }
+        };
+    assertThatThrownBy(
+            () ->
+                catalog.initialize(
+                    "test_catalog",
+                    new 
CaseInsensitiveStringMap(ImmutableMap.of(REST_CATALOG_PURGE, "true"))))
+        .isInstanceOf(IllegalArgumentException.class)
+        .hasMessageContaining(REST_CATALOG_PURGE);
+  }

Review Comment:
   Added `purgeTableDelegatesToCatalogWhenEnabledViaSessionCatalog()` to 
`TestRestDropPurgeTable`. It wraps a `SparkCatalog` backed by a `RESTCatalog` 
mock inside a `SparkSessionCatalog` and verifies that `purgeTable` results in 
`dropTable(..., true)` on the mock.



##########
core/src/main/java/org/apache/iceberg/rest/RESTCatalogProperties.java:
##########
@@ -55,6 +55,19 @@ private RESTCatalogProperties() {}
   public static final String TABLE_CACHE_MAX_ENTRIES = 
"rest-table-cache.max-entries";
   public static final int TABLE_CACHE_MAX_ENTRIES_DEFAULT = 100;
 
+  /**
+   * Controls whether engines using a REST Catalog should delegate DROP TABLE 
PURGE requests to the
+   * catalog instead of performing client-side file deletion.
+   *
+   * <p>When enabled, the engine sends the purge request to the REST catalog, 
allowing the catalog
+   * to handle deletion.
+   *
+   * <p>Defaults to false for backward compatibility.
+   */
+  public static final String REST_CATALOG_PURGE = "rest.catalog-purge";

Review Comment:
   Moved into a new `SparkCatalogProperties` class in the Spark module. The key 
is `rest-catalog-purge` (with a dash) as you suggested.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Spark: add rest.catalog-purge property to delegate DROP TABLE PURGE to REST catalogs [iceberg]

Reply via email to