This is an automated email from the ASF dual-hosted git repository.
mbutrovich pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/datafusion-comet.git
The following commit(s) were added to refs/heads/main by this push:
new 2106cefd1 docs: document iceberg spark tests in contributor guide
(#3777)
2106cefd1 is described below
commit 2106cefd10236a1a4385ce1295ca3bdf62d33d08
Author: Matt Butrovich <[email protected]>
AuthorDate: Mon Mar 23 16:04:32 2026 -0400
docs: document iceberg spark tests in contributor guide (#3777)
---
.../contributor-guide/iceberg-spark-tests.md | 96 ++++++++++++++++++++++
1 file changed, 96 insertions(+)
diff --git a/docs/source/contributor-guide/iceberg-spark-tests.md
b/docs/source/contributor-guide/iceberg-spark-tests.md
new file mode 100644
index 000000000..5cc5690f4
--- /dev/null
+++ b/docs/source/contributor-guide/iceberg-spark-tests.md
@@ -0,0 +1,96 @@
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Running Iceberg Spark Tests
+
+Running Apache Iceberg's Spark tests with Comet enabled is a good way to
ensure that Comet produces the same
+results as Spark when reading Iceberg tables. To enable this, we apply diff
files to the Apache Iceberg source
+code so that Comet is loaded when we run the tests.
+
+Here is an overview of the changes that the diffs make to Iceberg:
+
+- Configure Comet as a dependency and set the correct version in
`libs.versions.toml` and `build.gradle`
+- Delete upstream Comet reader classes that reference legacy Comet APIs
removed in [#3739]. These classes were
+ added upstream in [apache/iceberg#15674] and depend on Comet's old Iceberg
Java integration. Since Comet now
+ uses a native Iceberg scan, these classes fail to compile and must be
removed.
+- Configure test base classes (`TestBase`, `ExtensionsTestBase`,
`ScanTestBase`, etc.) to load the Comet Spark
+ plugin and shuffle manager
+
+[#3739]: https://github.com/apache/datafusion-comet/pull/3739
+[apache/iceberg#15674]: https://github.com/apache/iceberg/pull/15674
+
+## 1. Install Comet
+
+Run `make release` in Comet to install the Comet JAR into the local Maven
repository, specifying the Spark version.
+
+```shell
+PROFILES="-Pspark-3.5" make release
+```
+
+## 2. Clone Iceberg and Apply Diff
+
+Clone Apache Iceberg locally and apply the diff file from Comet against the
matching tag.
+
+```shell
+git clone [email protected]:apache/iceberg.git apache-iceberg
+cd apache-iceberg
+git checkout apache-iceberg-1.8.1
+git apply ../datafusion-comet/dev/diffs/iceberg-rust/1.8.1.diff
+```
+
+## 3. Run Iceberg Spark Tests
+
+```shell
+ENABLE_COMET=true ./gradlew -DsparkVersions=3.5 -DscalaVersion=2.13
-DflinkVersions= -DkafkaVersions= \
+ :iceberg-spark:iceberg-spark-3.5_2.13:test \
+ -Pquick=true -x javadoc
+```
+
+The three Gradle targets tested in CI are:
+
+- `:iceberg-spark:iceberg-spark-<sparkVersion>_<scalaVersion>:test`
+- `:iceberg-spark:iceberg-spark-extensions-<sparkVersion>_<scalaVersion>:test`
+-
`:iceberg-spark:iceberg-spark-runtime-<sparkVersion>_<scalaVersion>:integrationTest`
+
+## Updating Diffs
+
+To update a diff (e.g. after modifying test configuration), apply the existing
diff, make changes, then
+regenerate:
+
+```shell
+cd apache-iceberg
+git reset --hard apache-iceberg-1.8.1 && git clean -fd
+git apply ../datafusion-comet/dev/diffs/iceberg-rust/1.8.1.diff
+
+# Make changes, then run spotless to fix formatting
+./gradlew spotlessApply
+
+# Stage any new or deleted files, then generate the diff
+git add -A
+git diff apache-iceberg-1.8.1 >
../datafusion-comet/dev/diffs/iceberg-rust/1.8.1.diff
+```
+
+Repeat for each Iceberg version (1.8.1, 1.9.1, 1.10.0). The file contents
differ between versions, so each
+diff must be generated against its own tag.
+
+## Running Tests in CI
+
+The `iceberg_spark_test.yml` workflow applies these diffs and runs the three
Gradle targets above against
+each Iceberg version. The test matrix covers Spark 3.4 and 3.5 across Iceberg
1.8.1, 1.9.1, and 1.10.0
+with Java 11 and 17. The workflow only runs when the PR title contains
`[iceberg]`.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]