This is an automated email from the ASF dual-hosted git repository.
stevenzwu pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/iceberg.git
The following commit(s) were added to refs/heads/main by this push:
new 176c6e30f6 Spark: Trim row-level test parameter rows from 6 to 3
(#16549)
176c6e30f6 is described below
commit 176c6e30f61280c41cc93ea98a73aa3b9403fad5
Author: Steven Zhen Wu <[email protected]>
AuthorDate: Wed May 27 11:34:19 2026 -0700
Spark: Trim row-level test parameter rows from 6 to 3 (#16549)
* Spark: Trim row-level test parameter rows from 6 to 3
Reduces SparkRowLevelOperationsTestBase parameter rows from 6 to 3 in
v4.0 / v4.1 (and from 7 to 3 in v3.5), shifting from "test every catalog
backend" to "test the catalogs that matter for production":
- testhive (Hive) — kept as the established Hive metastore baseline
- testrest (REST) — added in place of testhadoop, since REST is the
OSS-strategic catalog and testhadoop isn't recommended for prod
- spark_catalog (REST-backed) — repointed from Hive to REST so the
SessionCatalog row exercises the REST commit path instead of Hive
formatVersion 2 covered by the testhive and testrest rows; formatVersion
3 covered by the spark_catalog/REST row, which exercises the DV
(deletion-vector) path that validateSnapshot checks via
formatVersion >= 3.
The trim affects 9 concrete subclasses (TestCopyOnWriteMerge/Update/Delete,
TestMergeOnReadMerge/Update/Delete, both *MergeMetrics, and
TestMergeSchemaEvolution), each cutting test invocations by 50% (~57%
on Spark 3.5).
Note: TestCopyOnWriteWithLineage and TestMergeOnReadWithLineage are
unaffected — TestRowLevelOperationsWithLineage redeclares parameters()
with its own row set.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
* Replace non-session REST row with second testhive row
Drops the testrest row in favor of restoring testhive as the carrier for
the HASH/null/DISTRIBUTED axes that previously sat on testhadoop. REST
catalog coverage is preserved by the spark_catalog (REST-backed) row.
Rationale: for the row-level operation code paths these tests exercise,
a non-session SparkCatalog wrapper around REST adds little beyond what
the SessionCatalog wrapper already covers. Both return SparkTable; both
commit through the same MergingSnapshotProducer; the differences live
in table-resolution paths (DDL/aliasing), not in MERGE/UPDATE/DELETE.
Other test classes (TestStructuredStreamingRead3 etc.) already exercise
REST as a non-session catalog.
Eliminates 2 of the 3 known TestBase.move() / metadata-delete fixture
failures from the previous version. The remaining failure
(testDeleteWithoutScanningTable on spark_catalog/REST) is a pre-existing
TestBase.move() limitation with non-URI paths and needs a follow-up fix.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
* Spark: Fix TestBase.move() to handle locations without URI schemes
Paths.get(URI.create(location)) requires the URI to carry a scheme.
HiveCatalog returns manifest paths with file:// schemes, so the
existing move() works. RESTCatalog (as configured by RESTServerExtension
in tests) returns plain local paths without a scheme, which makes
Paths.get(URI) throw IllegalArgumentException: Missing scheme.
Add a small toPath() helper that falls back to Paths.get(location) when
URI.create(location).getScheme() is null. This unblocks
testDeleteWithoutScanningTable and the equivalent MERGE-side test on
spark_catalog/REST rows.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
* Build: Fix REST catalog test fixture warehouse path scheme
Move the URI-scheme handling from TestBase.move() (Spark) to the source
of the inconsistency: the REST catalog test fixture's default warehouse
path. RESTCatalogServer used getAbsolutePath() which returns plain
filesystem paths without a URI scheme; HiveCatalog and HadoopCatalog
return file:// paths via Hadoop's Path machinery, so test fixtures
that consume catalog paths (e.g. TestBase.move() which calls
Paths.get(URI.create(...))) work for those catalogs but break for REST.
Switch to toURI().toString() so the REST fixture matches the Hive and
Hadoop convention. testDeleteWithoutScanningTable and the equivalent
MERGE-side test now pass on the spark_catalog/REST row without needing
scheme-handling logic at every consumer.
Reverts the TestBase.move() change from the previous commit.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
* Spark: Drop REST-only path workaround in TestRewriteTablePathProcedure
The test had a special-case branch that called file.getAbsolutePath() (no
scheme) instead of file.toURI().toString() when catalogName was testrest,
because the old REST fixture returned warehouse paths without a scheme
and the test needed the deletes.parquet path to match.
With RESTCatalogServer now using toURI().toString() for the default
warehouse, table.location() returns a scheme-prefixed path on REST too,
so the workaround is no longer needed - and is now incorrect, since
RewriteTablePathUtil.newPositionDeleteEntry validates that delete file
paths start with the table location prefix (which now includes file://).
Removing the special case lets the simple toURI().toString() path apply
uniformly across all catalogs.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <[email protected]>
---
.../org/apache/iceberg/rest/RESTCatalogServer.java | 2 +-
.../SparkRowLevelOperationsTestBase.java | 70 ++--------------------
.../extensions/TestRewriteTablePathProcedure.java | 7 ---
.../SparkRowLevelOperationsTestBase.java | 58 ++----------------
.../extensions/TestRewriteTablePathProcedure.java | 7 ---
.../SparkRowLevelOperationsTestBase.java | 58 ++----------------
.../extensions/TestRewriteTablePathProcedure.java | 7 ---
7 files changed, 16 insertions(+), 193 deletions(-)
diff --git
a/open-api/src/testFixtures/java/org/apache/iceberg/rest/RESTCatalogServer.java
b/open-api/src/testFixtures/java/org/apache/iceberg/rest/RESTCatalogServer.java
index 2e4541b50b..daed482c74 100644
---
a/open-api/src/testFixtures/java/org/apache/iceberg/rest/RESTCatalogServer.java
+++
b/open-api/src/testFixtures/java/org/apache/iceberg/rest/RESTCatalogServer.java
@@ -92,7 +92,7 @@ public class RESTCatalogServer {
if (warehouseLocation == null) {
File tmp =
java.nio.file.Files.createTempDirectory("iceberg_warehouse").toFile();
tmp.deleteOnExit();
- warehouseLocation = new File(tmp, "iceberg_data").getAbsolutePath();
+ warehouseLocation = new File(tmp, "iceberg_data").toURI().toString();
catalogProperties.put(CatalogProperties.WAREHOUSE_LOCATION,
warehouseLocation);
LOG.info("No warehouse location set. Defaulting to temp location: {}",
warehouseLocation);
diff --git
a/spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/SparkRowLevelOperationsTestBase.java
b/spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/SparkRowLevelOperationsTestBase.java
index 893f9931cf..72988ae0ed 100644
---
a/spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/SparkRowLevelOperationsTestBase.java
+++
b/spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/SparkRowLevelOperationsTestBase.java
@@ -46,11 +46,10 @@ import java.io.UncheckedIOException;
import java.util.Arrays;
import java.util.List;
import java.util.Map;
-import java.util.Random;
import java.util.Set;
import java.util.UUID;
-import java.util.concurrent.ThreadLocalRandom;
import java.util.stream.Collectors;
+import org.apache.iceberg.CatalogProperties;
import org.apache.iceberg.DataFile;
import org.apache.iceberg.FileFormat;
import org.apache.iceberg.Files;
@@ -83,8 +82,6 @@ import org.junit.jupiter.api.extension.ExtendWith;
@ExtendWith(ParameterizedTestExtension.class)
public abstract class SparkRowLevelOperationsTestBase extends
ExtensionsTestBase {
- private static final Random RANDOM = ThreadLocalRandom.current();
-
@Parameter(index = 3)
protected FileFormat fileFormat;
@@ -135,79 +132,22 @@ public abstract class SparkRowLevelOperationsTestBase
extends ExtensionsTestBase
"default-namespace", "default"),
FileFormat.PARQUET,
true,
- WRITE_DISTRIBUTION_MODE_NONE,
- false,
- "test",
- DISTRIBUTED,
- 2
- },
- {
- "testhadoop",
- SparkCatalog.class.getName(),
- ImmutableMap.of("type", "hadoop"),
- FileFormat.PARQUET,
- RANDOM.nextBoolean(),
WRITE_DISTRIBUTION_MODE_HASH,
- true,
- null,
- LOCAL,
- 2
- },
- {
- "spark_catalog",
- SparkSessionCatalog.class.getName(),
- ImmutableMap.of(
- "type", "hive",
- "default-namespace", "default",
- "clients", "1",
- "parquet-enabled", "false",
- "cache-enabled",
- "false" // Spark will delete tables using v1, leaving the
cache out of sync
- ),
- FileFormat.AVRO,
false,
- WRITE_DISTRIBUTION_MODE_RANGE,
- false,
- "test",
+ null,
DISTRIBUTED,
2
},
- {
- "testhadoop",
- SparkCatalog.class.getName(),
- ImmutableMap.of("type", "hadoop"),
- FileFormat.PARQUET,
- true,
- WRITE_DISTRIBUTION_MODE_HASH,
- true,
- null,
- LOCAL,
- 3
- },
- {
- "testhadoop",
- SparkCatalog.class.getName(),
- ImmutableMap.of("type", "hadoop"),
- FileFormat.PARQUET,
- false,
- WRITE_DISTRIBUTION_MODE_HASH,
- true,
- null,
- LOCAL,
- 3
- },
{
"spark_catalog",
SparkSessionCatalog.class.getName(),
ImmutableMap.of(
"type",
- "hive",
+ "rest",
+ CatalogProperties.URI,
+ restCatalog.properties().get(CatalogProperties.URI),
"default-namespace",
"default",
- "clients",
- "1",
- "parquet-enabled",
- "false",
"cache-enabled",
"false" // Spark will delete tables using v1, leaving the cache
out of sync
),
diff --git
a/spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestRewriteTablePathProcedure.java
b/spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestRewriteTablePathProcedure.java
index ceb3077c56..996fb2636a 100644
---
a/spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestRewriteTablePathProcedure.java
+++
b/spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestRewriteTablePathProcedure.java
@@ -35,7 +35,6 @@ import org.apache.iceberg.Table;
import org.apache.iceberg.TableUtil;
import org.apache.iceberg.data.FileHelpers;
import org.apache.iceberg.relocated.com.google.common.collect.Lists;
-import org.apache.iceberg.spark.SparkCatalogConfig;
import org.apache.iceberg.util.Pair;
import org.apache.spark.sql.AnalysisException;
import org.junit.jupiter.api.AfterEach;
@@ -222,12 +221,6 @@ public class TestRewriteTablePathProcedure extends
ExtensionsTestBase {
File file = new File(removePrefix(table.location()) +
"/data/deletes.parquet");
String filePath = file.toURI().toString();
- if (SparkCatalogConfig.REST.catalogName().equals(catalogName)) {
- // We applied this special handling because the base path for
- // matching the RESTCATALOG's Hive BaseLocation is represented
- // in the form of an AbsolutePath.
- filePath = file.getAbsolutePath().toString();
- }
DeleteFile positionDeletes =
FileHelpers.writeDeleteFile(table, table.io().newOutputFile(filePath),
rowsToDelete)
diff --git
a/spark/v4.0/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/SparkRowLevelOperationsTestBase.java
b/spark/v4.0/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/SparkRowLevelOperationsTestBase.java
index b5d6415763..72988ae0ed 100644
---
a/spark/v4.0/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/SparkRowLevelOperationsTestBase.java
+++
b/spark/v4.0/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/SparkRowLevelOperationsTestBase.java
@@ -46,11 +46,10 @@ import java.io.UncheckedIOException;
import java.util.Arrays;
import java.util.List;
import java.util.Map;
-import java.util.Random;
import java.util.Set;
import java.util.UUID;
-import java.util.concurrent.ThreadLocalRandom;
import java.util.stream.Collectors;
+import org.apache.iceberg.CatalogProperties;
import org.apache.iceberg.DataFile;
import org.apache.iceberg.FileFormat;
import org.apache.iceberg.Files;
@@ -83,8 +82,6 @@ import org.junit.jupiter.api.extension.ExtendWith;
@ExtendWith(ParameterizedTestExtension.class)
public abstract class SparkRowLevelOperationsTestBase extends
ExtensionsTestBase {
- private static final Random RANDOM = ThreadLocalRandom.current();
-
@Parameter(index = 3)
protected FileFormat fileFormat;
@@ -135,67 +132,22 @@ public abstract class SparkRowLevelOperationsTestBase
extends ExtensionsTestBase
"default-namespace", "default"),
FileFormat.PARQUET,
true,
- WRITE_DISTRIBUTION_MODE_NONE,
- false,
- "test",
- DISTRIBUTED,
- 2
- },
- {
- "testhadoop",
- SparkCatalog.class.getName(),
- ImmutableMap.of("type", "hadoop"),
- FileFormat.PARQUET,
- RANDOM.nextBoolean(),
WRITE_DISTRIBUTION_MODE_HASH,
- true,
- null,
- LOCAL,
- 2
- },
- {
- "spark_catalog",
- SparkSessionCatalog.class.getName(),
- ImmutableMap.of(
- "type", "hive",
- "default-namespace", "default",
- "clients", "1",
- "parquet-enabled", "false",
- "cache-enabled",
- "false" // Spark will delete tables using v1, leaving the
cache out of sync
- ),
- FileFormat.AVRO,
false,
- WRITE_DISTRIBUTION_MODE_RANGE,
- false,
- "test",
+ null,
DISTRIBUTED,
2
},
- {
- "testhadoop",
- SparkCatalog.class.getName(),
- ImmutableMap.of("type", "hadoop"),
- FileFormat.PARQUET,
- RANDOM.nextBoolean(),
- WRITE_DISTRIBUTION_MODE_HASH,
- true,
- null,
- LOCAL,
- 3
- },
{
"spark_catalog",
SparkSessionCatalog.class.getName(),
ImmutableMap.of(
"type",
- "hive",
+ "rest",
+ CatalogProperties.URI,
+ restCatalog.properties().get(CatalogProperties.URI),
"default-namespace",
"default",
- "clients",
- "1",
- "parquet-enabled",
- "false",
"cache-enabled",
"false" // Spark will delete tables using v1, leaving the cache
out of sync
),
diff --git
a/spark/v4.0/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestRewriteTablePathProcedure.java
b/spark/v4.0/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestRewriteTablePathProcedure.java
index 78f6b80ac9..cab33c8b00 100644
---
a/spark/v4.0/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestRewriteTablePathProcedure.java
+++
b/spark/v4.0/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestRewriteTablePathProcedure.java
@@ -35,7 +35,6 @@ import org.apache.iceberg.Table;
import org.apache.iceberg.TableUtil;
import org.apache.iceberg.data.FileHelpers;
import org.apache.iceberg.relocated.com.google.common.collect.Lists;
-import org.apache.iceberg.spark.SparkCatalogConfig;
import org.apache.iceberg.util.Pair;
import org.apache.spark.sql.AnalysisException;
import org.junit.jupiter.api.AfterEach;
@@ -224,12 +223,6 @@ public class TestRewriteTablePathProcedure extends
ExtensionsTestBase {
File file = new File(removePrefix(table.location()) +
"/data/deletes.parquet");
String filePath = file.toURI().toString();
- if (SparkCatalogConfig.REST.catalogName().equals(catalogName)) {
- // We applied this special handling because the base path for
- // matching the RESTCATALOG's Hive BaseLocation is represented
- // in the form of an AbsolutePath.
- filePath = file.getAbsolutePath().toString();
- }
DeleteFile positionDeletes =
FileHelpers.writeDeleteFile(table, table.io().newOutputFile(filePath),
rowsToDelete)
diff --git
a/spark/v4.1/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/SparkRowLevelOperationsTestBase.java
b/spark/v4.1/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/SparkRowLevelOperationsTestBase.java
index b5d6415763..72988ae0ed 100644
---
a/spark/v4.1/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/SparkRowLevelOperationsTestBase.java
+++
b/spark/v4.1/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/SparkRowLevelOperationsTestBase.java
@@ -46,11 +46,10 @@ import java.io.UncheckedIOException;
import java.util.Arrays;
import java.util.List;
import java.util.Map;
-import java.util.Random;
import java.util.Set;
import java.util.UUID;
-import java.util.concurrent.ThreadLocalRandom;
import java.util.stream.Collectors;
+import org.apache.iceberg.CatalogProperties;
import org.apache.iceberg.DataFile;
import org.apache.iceberg.FileFormat;
import org.apache.iceberg.Files;
@@ -83,8 +82,6 @@ import org.junit.jupiter.api.extension.ExtendWith;
@ExtendWith(ParameterizedTestExtension.class)
public abstract class SparkRowLevelOperationsTestBase extends
ExtensionsTestBase {
- private static final Random RANDOM = ThreadLocalRandom.current();
-
@Parameter(index = 3)
protected FileFormat fileFormat;
@@ -135,67 +132,22 @@ public abstract class SparkRowLevelOperationsTestBase
extends ExtensionsTestBase
"default-namespace", "default"),
FileFormat.PARQUET,
true,
- WRITE_DISTRIBUTION_MODE_NONE,
- false,
- "test",
- DISTRIBUTED,
- 2
- },
- {
- "testhadoop",
- SparkCatalog.class.getName(),
- ImmutableMap.of("type", "hadoop"),
- FileFormat.PARQUET,
- RANDOM.nextBoolean(),
WRITE_DISTRIBUTION_MODE_HASH,
- true,
- null,
- LOCAL,
- 2
- },
- {
- "spark_catalog",
- SparkSessionCatalog.class.getName(),
- ImmutableMap.of(
- "type", "hive",
- "default-namespace", "default",
- "clients", "1",
- "parquet-enabled", "false",
- "cache-enabled",
- "false" // Spark will delete tables using v1, leaving the
cache out of sync
- ),
- FileFormat.AVRO,
false,
- WRITE_DISTRIBUTION_MODE_RANGE,
- false,
- "test",
+ null,
DISTRIBUTED,
2
},
- {
- "testhadoop",
- SparkCatalog.class.getName(),
- ImmutableMap.of("type", "hadoop"),
- FileFormat.PARQUET,
- RANDOM.nextBoolean(),
- WRITE_DISTRIBUTION_MODE_HASH,
- true,
- null,
- LOCAL,
- 3
- },
{
"spark_catalog",
SparkSessionCatalog.class.getName(),
ImmutableMap.of(
"type",
- "hive",
+ "rest",
+ CatalogProperties.URI,
+ restCatalog.properties().get(CatalogProperties.URI),
"default-namespace",
"default",
- "clients",
- "1",
- "parquet-enabled",
- "false",
"cache-enabled",
"false" // Spark will delete tables using v1, leaving the cache
out of sync
),
diff --git
a/spark/v4.1/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestRewriteTablePathProcedure.java
b/spark/v4.1/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestRewriteTablePathProcedure.java
index 78f6b80ac9..cab33c8b00 100644
---
a/spark/v4.1/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestRewriteTablePathProcedure.java
+++
b/spark/v4.1/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestRewriteTablePathProcedure.java
@@ -35,7 +35,6 @@ import org.apache.iceberg.Table;
import org.apache.iceberg.TableUtil;
import org.apache.iceberg.data.FileHelpers;
import org.apache.iceberg.relocated.com.google.common.collect.Lists;
-import org.apache.iceberg.spark.SparkCatalogConfig;
import org.apache.iceberg.util.Pair;
import org.apache.spark.sql.AnalysisException;
import org.junit.jupiter.api.AfterEach;
@@ -224,12 +223,6 @@ public class TestRewriteTablePathProcedure extends
ExtensionsTestBase {
File file = new File(removePrefix(table.location()) +
"/data/deletes.parquet");
String filePath = file.toURI().toString();
- if (SparkCatalogConfig.REST.catalogName().equals(catalogName)) {
- // We applied this special handling because the base path for
- // matching the RESTCATALOG's Hive BaseLocation is represented
- // in the form of an AbsolutePath.
- filePath = file.getAbsolutePath().toString();
- }
DeleteFile positionDeletes =
FileHelpers.writeDeleteFile(table, table.io().newOutputFile(filePath),
rowsToDelete)