(iceberg) branch main updated: Spark: Trim row-level test parameter rows from 6 to 3 (#16549)

stevenwu Wed, 27 May 2026 11:35:09 -0700

This is an automated email from the ASF dual-hosted git repository.

stevenzwu pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/iceberg.git



The following commit(s) were added to refs/heads/main by this push:
     new 176c6e30f6 Spark: Trim row-level test parameter rows from 6 to 3 
(#16549)
176c6e30f6 is described below

commit 176c6e30f61280c41cc93ea98a73aa3b9403fad5
Author: Steven Zhen Wu <[email protected]>
AuthorDate: Wed May 27 11:34:19 2026 -0700

    Spark: Trim row-level test parameter rows from 6 to 3 (#16549)
    
    * Spark: Trim row-level test parameter rows from 6 to 3
    
    Reduces SparkRowLevelOperationsTestBase parameter rows from 6 to 3 in
    v4.0 / v4.1 (and from 7 to 3 in v3.5), shifting from "test every catalog
    backend" to "test the catalogs that matter for production":
    
      - testhive (Hive) — kept as the established Hive metastore baseline
      - testrest (REST) — added in place of testhadoop, since REST is the
        OSS-strategic catalog and testhadoop isn't recommended for prod
      - spark_catalog (REST-backed) — repointed from Hive to REST so the
        SessionCatalog row exercises the REST commit path instead of Hive
    
    formatVersion 2 covered by the testhive and testrest rows; formatVersion
    3 covered by the spark_catalog/REST row, which exercises the DV
    (deletion-vector) path that validateSnapshot checks via
    formatVersion >= 3.
    
    The trim affects 9 concrete subclasses (TestCopyOnWriteMerge/Update/Delete,
    TestMergeOnReadMerge/Update/Delete, both *MergeMetrics, and
    TestMergeSchemaEvolution), each cutting test invocations by 50% (~57%
    on Spark 3.5).
    
    Note: TestCopyOnWriteWithLineage and TestMergeOnReadWithLineage are
    unaffected — TestRowLevelOperationsWithLineage redeclares parameters()
    with its own row set.
    
    Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
    
    * Replace non-session REST row with second testhive row
    
    Drops the testrest row in favor of restoring testhive as the carrier for
    the HASH/null/DISTRIBUTED axes that previously sat on testhadoop. REST
    catalog coverage is preserved by the spark_catalog (REST-backed) row.
    
    Rationale: for the row-level operation code paths these tests exercise,
    a non-session SparkCatalog wrapper around REST adds little beyond what
    the SessionCatalog wrapper already covers. Both return SparkTable; both
    commit through the same MergingSnapshotProducer; the differences live
    in table-resolution paths (DDL/aliasing), not in MERGE/UPDATE/DELETE.
    Other test classes (TestStructuredStreamingRead3 etc.) already exercise
    REST as a non-session catalog.
    
    Eliminates 2 of the 3 known TestBase.move() / metadata-delete fixture
    failures from the previous version. The remaining failure
    (testDeleteWithoutScanningTable on spark_catalog/REST) is a pre-existing
    TestBase.move() limitation with non-URI paths and needs a follow-up fix.
    
    Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
    
    * Spark: Fix TestBase.move() to handle locations without URI schemes
    
    Paths.get(URI.create(location)) requires the URI to carry a scheme.
    HiveCatalog returns manifest paths with file:// schemes, so the
    existing move() works. RESTCatalog (as configured by RESTServerExtension
    in tests) returns plain local paths without a scheme, which makes
    Paths.get(URI) throw IllegalArgumentException: Missing scheme.
    
    Add a small toPath() helper that falls back to Paths.get(location) when
    URI.create(location).getScheme() is null. This unblocks
    testDeleteWithoutScanningTable and the equivalent MERGE-side test on
    spark_catalog/REST rows.
    
    Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
    
    * Build: Fix REST catalog test fixture warehouse path scheme
    
    Move the URI-scheme handling from TestBase.move() (Spark) to the source
    of the inconsistency: the REST catalog test fixture's default warehouse
    path. RESTCatalogServer used getAbsolutePath() which returns plain
    filesystem paths without a URI scheme; HiveCatalog and HadoopCatalog
    return file:// paths via Hadoop's Path machinery, so test fixtures
    that consume catalog paths (e.g. TestBase.move() which calls
    Paths.get(URI.create(...))) work for those catalogs but break for REST.
    
    Switch to toURI().toString() so the REST fixture matches the Hive and
    Hadoop convention. testDeleteWithoutScanningTable and the equivalent
    MERGE-side test now pass on the spark_catalog/REST row without needing
    scheme-handling logic at every consumer.
    
    Reverts the TestBase.move() change from the previous commit.
    
    Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
    
    * Spark: Drop REST-only path workaround in TestRewriteTablePathProcedure
    
    The test had a special-case branch that called file.getAbsolutePath() (no
    scheme) instead of file.toURI().toString() when catalogName was testrest,
    because the old REST fixture returned warehouse paths without a scheme
    and the test needed the deletes.parquet path to match.
    
    With RESTCatalogServer now using toURI().toString() for the default
    warehouse, table.location() returns a scheme-prefixed path on REST too,
    so the workaround is no longer needed - and is now incorrect, since
    RewriteTablePathUtil.newPositionDeleteEntry validates that delete file
    paths start with the table location prefix (which now includes file://).
    
    Removing the special case lets the simple toURI().toString() path apply
    uniformly across all catalogs.
    
    Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
    
    ---------
    
    Co-authored-by: Claude Opus 4.7 (1M context) <[email protected]>
---
 .../org/apache/iceberg/rest/RESTCatalogServer.java |  2 +-
 .../SparkRowLevelOperationsTestBase.java           | 70 ++--------------------
 .../extensions/TestRewriteTablePathProcedure.java  |  7 ---
 .../SparkRowLevelOperationsTestBase.java           | 58 ++----------------
 .../extensions/TestRewriteTablePathProcedure.java  |  7 ---
 .../SparkRowLevelOperationsTestBase.java           | 58 ++----------------
 .../extensions/TestRewriteTablePathProcedure.java  |  7 ---
 7 files changed, 16 insertions(+), 193 deletions(-)

diff --git 
a/open-api/src/testFixtures/java/org/apache/iceberg/rest/RESTCatalogServer.java 
b/open-api/src/testFixtures/java/org/apache/iceberg/rest/RESTCatalogServer.java
index 2e4541b50b..daed482c74 100644
--- 
a/open-api/src/testFixtures/java/org/apache/iceberg/rest/RESTCatalogServer.java
+++ 
b/open-api/src/testFixtures/java/org/apache/iceberg/rest/RESTCatalogServer.java
@@ -92,7 +92,7 @@ public class RESTCatalogServer {
     if (warehouseLocation == null) {
       File tmp = 
java.nio.file.Files.createTempDirectory("iceberg_warehouse").toFile();
       tmp.deleteOnExit();
-      warehouseLocation = new File(tmp, "iceberg_data").getAbsolutePath();
+      warehouseLocation = new File(tmp, "iceberg_data").toURI().toString();
       catalogProperties.put(CatalogProperties.WAREHOUSE_LOCATION, 
warehouseLocation);
 
       LOG.info("No warehouse location set. Defaulting to temp location: {}", 
warehouseLocation);
diff --git 
a/spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/SparkRowLevelOperationsTestBase.java
 
b/spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/SparkRowLevelOperationsTestBase.java
index 893f9931cf..72988ae0ed 100644
--- 
a/spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/SparkRowLevelOperationsTestBase.java
+++ 
b/spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/SparkRowLevelOperationsTestBase.java
@@ -46,11 +46,10 @@ import java.io.UncheckedIOException;
 import java.util.Arrays;
 import java.util.List;
 import java.util.Map;
-import java.util.Random;
 import java.util.Set;
 import java.util.UUID;
-import java.util.concurrent.ThreadLocalRandom;
 import java.util.stream.Collectors;
+import org.apache.iceberg.CatalogProperties;
 import org.apache.iceberg.DataFile;
 import org.apache.iceberg.FileFormat;
 import org.apache.iceberg.Files;
@@ -83,8 +82,6 @@ import org.junit.jupiter.api.extension.ExtendWith;
 @ExtendWith(ParameterizedTestExtension.class)
 public abstract class SparkRowLevelOperationsTestBase extends 
ExtensionsTestBase {
 
-  private static final Random RANDOM = ThreadLocalRandom.current();
-
   @Parameter(index = 3)
   protected FileFormat fileFormat;
 
@@ -135,79 +132,22 @@ public abstract class SparkRowLevelOperationsTestBase 
extends ExtensionsTestBase
             "default-namespace", "default"),
         FileFormat.PARQUET,
         true,
-        WRITE_DISTRIBUTION_MODE_NONE,
-        false,
-        "test",
-        DISTRIBUTED,
-        2
-      },
-      {
-        "testhadoop",
-        SparkCatalog.class.getName(),
-        ImmutableMap.of("type", "hadoop"),
-        FileFormat.PARQUET,
-        RANDOM.nextBoolean(),
         WRITE_DISTRIBUTION_MODE_HASH,
-        true,
-        null,
-        LOCAL,
-        2
-      },
-      {
-        "spark_catalog",
-        SparkSessionCatalog.class.getName(),
-        ImmutableMap.of(
-            "type", "hive",
-            "default-namespace", "default",
-            "clients", "1",
-            "parquet-enabled", "false",
-            "cache-enabled",
-                "false" // Spark will delete tables using v1, leaving the 
cache out of sync
-            ),
-        FileFormat.AVRO,
         false,
-        WRITE_DISTRIBUTION_MODE_RANGE,
-        false,
-        "test",
+        null,
         DISTRIBUTED,
         2
       },
-      {
-        "testhadoop",
-        SparkCatalog.class.getName(),
-        ImmutableMap.of("type", "hadoop"),
-        FileFormat.PARQUET,
-        true,
-        WRITE_DISTRIBUTION_MODE_HASH,
-        true,
-        null,
-        LOCAL,
-        3
-      },
-      {
-        "testhadoop",
-        SparkCatalog.class.getName(),
-        ImmutableMap.of("type", "hadoop"),
-        FileFormat.PARQUET,
-        false,
-        WRITE_DISTRIBUTION_MODE_HASH,
-        true,
-        null,
-        LOCAL,
-        3
-      },
       {
         "spark_catalog",
         SparkSessionCatalog.class.getName(),
         ImmutableMap.of(
             "type",
-            "hive",
+            "rest",
+            CatalogProperties.URI,
+            restCatalog.properties().get(CatalogProperties.URI),
             "default-namespace",
             "default",
-            "clients",
-            "1",
-            "parquet-enabled",
-            "false",
             "cache-enabled",
             "false" // Spark will delete tables using v1, leaving the cache 
out of sync
             ),
diff --git 
a/spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestRewriteTablePathProcedure.java
 
b/spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestRewriteTablePathProcedure.java
index ceb3077c56..996fb2636a 100644
--- 
a/spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestRewriteTablePathProcedure.java
+++ 
b/spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestRewriteTablePathProcedure.java
@@ -35,7 +35,6 @@ import org.apache.iceberg.Table;
 import org.apache.iceberg.TableUtil;
 import org.apache.iceberg.data.FileHelpers;
 import org.apache.iceberg.relocated.com.google.common.collect.Lists;
-import org.apache.iceberg.spark.SparkCatalogConfig;
 import org.apache.iceberg.util.Pair;
 import org.apache.spark.sql.AnalysisException;
 import org.junit.jupiter.api.AfterEach;
@@ -222,12 +221,6 @@ public class TestRewriteTablePathProcedure extends 
ExtensionsTestBase {
 
     File file = new File(removePrefix(table.location()) + 
"/data/deletes.parquet");
     String filePath = file.toURI().toString();
-    if (SparkCatalogConfig.REST.catalogName().equals(catalogName)) {
-      // We applied this special handling because the base path for
-      // matching the RESTCATALOG's Hive BaseLocation is represented
-      // in the form of an AbsolutePath.
-      filePath = file.getAbsolutePath().toString();
-    }
 
     DeleteFile positionDeletes =
         FileHelpers.writeDeleteFile(table, table.io().newOutputFile(filePath), 
rowsToDelete)
diff --git 
a/spark/v4.0/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/SparkRowLevelOperationsTestBase.java
 
b/spark/v4.0/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/SparkRowLevelOperationsTestBase.java
index b5d6415763..72988ae0ed 100644
--- 
a/spark/v4.0/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/SparkRowLevelOperationsTestBase.java
+++ 
b/spark/v4.0/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/SparkRowLevelOperationsTestBase.java
@@ -46,11 +46,10 @@ import java.io.UncheckedIOException;
 import java.util.Arrays;
 import java.util.List;
 import java.util.Map;
-import java.util.Random;
 import java.util.Set;
 import java.util.UUID;
-import java.util.concurrent.ThreadLocalRandom;
 import java.util.stream.Collectors;
+import org.apache.iceberg.CatalogProperties;
 import org.apache.iceberg.DataFile;
 import org.apache.iceberg.FileFormat;
 import org.apache.iceberg.Files;
@@ -83,8 +82,6 @@ import org.junit.jupiter.api.extension.ExtendWith;
 @ExtendWith(ParameterizedTestExtension.class)
 public abstract class SparkRowLevelOperationsTestBase extends 
ExtensionsTestBase {
 
-  private static final Random RANDOM = ThreadLocalRandom.current();
-
   @Parameter(index = 3)
   protected FileFormat fileFormat;
 
@@ -135,67 +132,22 @@ public abstract class SparkRowLevelOperationsTestBase 
extends ExtensionsTestBase
             "default-namespace", "default"),
         FileFormat.PARQUET,
         true,
-        WRITE_DISTRIBUTION_MODE_NONE,
-        false,
-        "test",
-        DISTRIBUTED,
-        2
-      },
-      {
-        "testhadoop",
-        SparkCatalog.class.getName(),
-        ImmutableMap.of("type", "hadoop"),
-        FileFormat.PARQUET,
-        RANDOM.nextBoolean(),
         WRITE_DISTRIBUTION_MODE_HASH,
-        true,
-        null,
-        LOCAL,
-        2
-      },
-      {
-        "spark_catalog",
-        SparkSessionCatalog.class.getName(),
-        ImmutableMap.of(
-            "type", "hive",
-            "default-namespace", "default",
-            "clients", "1",
-            "parquet-enabled", "false",
-            "cache-enabled",
-                "false" // Spark will delete tables using v1, leaving the 
cache out of sync
-            ),
-        FileFormat.AVRO,
         false,
-        WRITE_DISTRIBUTION_MODE_RANGE,
-        false,
-        "test",
+        null,
         DISTRIBUTED,
         2
       },
-      {
-        "testhadoop",
-        SparkCatalog.class.getName(),
-        ImmutableMap.of("type", "hadoop"),
-        FileFormat.PARQUET,
-        RANDOM.nextBoolean(),
-        WRITE_DISTRIBUTION_MODE_HASH,
-        true,
-        null,
-        LOCAL,
-        3
-      },
       {
         "spark_catalog",
         SparkSessionCatalog.class.getName(),
         ImmutableMap.of(
             "type",
-            "hive",
+            "rest",
+            CatalogProperties.URI,
+            restCatalog.properties().get(CatalogProperties.URI),
             "default-namespace",
             "default",
-            "clients",
-            "1",
-            "parquet-enabled",
-            "false",
             "cache-enabled",
             "false" // Spark will delete tables using v1, leaving the cache 
out of sync
             ),
diff --git 
a/spark/v4.0/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestRewriteTablePathProcedure.java
 
b/spark/v4.0/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestRewriteTablePathProcedure.java
index 78f6b80ac9..cab33c8b00 100644
--- 
a/spark/v4.0/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestRewriteTablePathProcedure.java
+++ 
b/spark/v4.0/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestRewriteTablePathProcedure.java
@@ -35,7 +35,6 @@ import org.apache.iceberg.Table;
 import org.apache.iceberg.TableUtil;
 import org.apache.iceberg.data.FileHelpers;
 import org.apache.iceberg.relocated.com.google.common.collect.Lists;
-import org.apache.iceberg.spark.SparkCatalogConfig;
 import org.apache.iceberg.util.Pair;
 import org.apache.spark.sql.AnalysisException;
 import org.junit.jupiter.api.AfterEach;
@@ -224,12 +223,6 @@ public class TestRewriteTablePathProcedure extends 
ExtensionsTestBase {
 
     File file = new File(removePrefix(table.location()) + 
"/data/deletes.parquet");
     String filePath = file.toURI().toString();
-    if (SparkCatalogConfig.REST.catalogName().equals(catalogName)) {
-      // We applied this special handling because the base path for
-      // matching the RESTCATALOG's Hive BaseLocation is represented
-      // in the form of an AbsolutePath.
-      filePath = file.getAbsolutePath().toString();
-    }
 
     DeleteFile positionDeletes =
         FileHelpers.writeDeleteFile(table, table.io().newOutputFile(filePath), 
rowsToDelete)
diff --git 
a/spark/v4.1/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/SparkRowLevelOperationsTestBase.java
 
b/spark/v4.1/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/SparkRowLevelOperationsTestBase.java
index b5d6415763..72988ae0ed 100644
--- 
a/spark/v4.1/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/SparkRowLevelOperationsTestBase.java
+++ 
b/spark/v4.1/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/SparkRowLevelOperationsTestBase.java
@@ -46,11 +46,10 @@ import java.io.UncheckedIOException;
 import java.util.Arrays;
 import java.util.List;
 import java.util.Map;
-import java.util.Random;
 import java.util.Set;
 import java.util.UUID;
-import java.util.concurrent.ThreadLocalRandom;
 import java.util.stream.Collectors;
+import org.apache.iceberg.CatalogProperties;
 import org.apache.iceberg.DataFile;
 import org.apache.iceberg.FileFormat;
 import org.apache.iceberg.Files;
@@ -83,8 +82,6 @@ import org.junit.jupiter.api.extension.ExtendWith;
 @ExtendWith(ParameterizedTestExtension.class)
 public abstract class SparkRowLevelOperationsTestBase extends 
ExtensionsTestBase {
 
-  private static final Random RANDOM = ThreadLocalRandom.current();
-
   @Parameter(index = 3)
   protected FileFormat fileFormat;
 
@@ -135,67 +132,22 @@ public abstract class SparkRowLevelOperationsTestBase 
extends ExtensionsTestBase
             "default-namespace", "default"),
         FileFormat.PARQUET,
         true,
-        WRITE_DISTRIBUTION_MODE_NONE,
-        false,
-        "test",
-        DISTRIBUTED,
-        2
-      },
-      {
-        "testhadoop",
-        SparkCatalog.class.getName(),
-        ImmutableMap.of("type", "hadoop"),
-        FileFormat.PARQUET,
-        RANDOM.nextBoolean(),
         WRITE_DISTRIBUTION_MODE_HASH,
-        true,
-        null,
-        LOCAL,
-        2
-      },
-      {
-        "spark_catalog",
-        SparkSessionCatalog.class.getName(),
-        ImmutableMap.of(
-            "type", "hive",
-            "default-namespace", "default",
-            "clients", "1",
-            "parquet-enabled", "false",
-            "cache-enabled",
-                "false" // Spark will delete tables using v1, leaving the 
cache out of sync
-            ),
-        FileFormat.AVRO,
         false,
-        WRITE_DISTRIBUTION_MODE_RANGE,
-        false,
-        "test",
+        null,
         DISTRIBUTED,
         2
       },
-      {
-        "testhadoop",
-        SparkCatalog.class.getName(),
-        ImmutableMap.of("type", "hadoop"),
-        FileFormat.PARQUET,
-        RANDOM.nextBoolean(),
-        WRITE_DISTRIBUTION_MODE_HASH,
-        true,
-        null,
-        LOCAL,
-        3
-      },
       {
         "spark_catalog",
         SparkSessionCatalog.class.getName(),
         ImmutableMap.of(
             "type",
-            "hive",
+            "rest",
+            CatalogProperties.URI,
+            restCatalog.properties().get(CatalogProperties.URI),
             "default-namespace",
             "default",
-            "clients",
-            "1",
-            "parquet-enabled",
-            "false",
             "cache-enabled",
             "false" // Spark will delete tables using v1, leaving the cache 
out of sync
             ),
diff --git 
a/spark/v4.1/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestRewriteTablePathProcedure.java
 
b/spark/v4.1/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestRewriteTablePathProcedure.java
index 78f6b80ac9..cab33c8b00 100644
--- 
a/spark/v4.1/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestRewriteTablePathProcedure.java
+++ 
b/spark/v4.1/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestRewriteTablePathProcedure.java
@@ -35,7 +35,6 @@ import org.apache.iceberg.Table;
 import org.apache.iceberg.TableUtil;
 import org.apache.iceberg.data.FileHelpers;
 import org.apache.iceberg.relocated.com.google.common.collect.Lists;
-import org.apache.iceberg.spark.SparkCatalogConfig;
 import org.apache.iceberg.util.Pair;
 import org.apache.spark.sql.AnalysisException;
 import org.junit.jupiter.api.AfterEach;
@@ -224,12 +223,6 @@ public class TestRewriteTablePathProcedure extends 
ExtensionsTestBase {
 
     File file = new File(removePrefix(table.location()) + 
"/data/deletes.parquet");
     String filePath = file.toURI().toString();
-    if (SparkCatalogConfig.REST.catalogName().equals(catalogName)) {
-      // We applied this special handling because the base path for
-      // matching the RESTCATALOG's Hive BaseLocation is represented
-      // in the form of an AbsolutePath.
-      filePath = file.getAbsolutePath().toString();
-    }
 
     DeleteFile positionDeletes =
         FileHelpers.writeDeleteFile(table, table.io().newOutputFile(filePath), 
rowsToDelete)

(iceberg) branch main updated: Spark: Trim row-level test parameter rows from 6 to 3 (#16549)

Reply via email to