This is an automated email from the ASF dual-hosted git repository.
github-bot pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/datafusion-comet.git
The following commit(s) were added to refs/heads/asf-site by this push:
new 636f9b001 Publish built docs triggered by
2106cefd10236a1a4385ce1295ca3bdf62d33d08
636f9b001 is described below
commit 636f9b001960d72006f8764afe98d6e210855500
Author: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
AuthorDate: Mon Mar 23 20:08:21 2026 +0000
Publish built docs triggered by 2106cefd10236a1a4385ce1295ca3bdf62d33d08
---
.../contributor-guide/iceberg-spark-tests.md.txt | 96 ++++
_sources/user-guide/latest/compatibility.md.txt | 2 +-
_sources/user-guide/latest/configs.md.txt | 1 +
contributor-guide/iceberg-spark-tests.html | 550 +++++++++++++++++++++
objects.inv | Bin 1619 -> 1637 bytes
searchindex.js | 2 +-
user-guide/latest/compatibility.html | 2 +-
user-guide/latest/configs.html | 20 +-
8 files changed, 662 insertions(+), 11 deletions(-)
diff --git a/_sources/contributor-guide/iceberg-spark-tests.md.txt
b/_sources/contributor-guide/iceberg-spark-tests.md.txt
new file mode 100644
index 000000000..5cc5690f4
--- /dev/null
+++ b/_sources/contributor-guide/iceberg-spark-tests.md.txt
@@ -0,0 +1,96 @@
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Running Iceberg Spark Tests
+
+Running Apache Iceberg's Spark tests with Comet enabled is a good way to
ensure that Comet produces the same
+results as Spark when reading Iceberg tables. To enable this, we apply diff
files to the Apache Iceberg source
+code so that Comet is loaded when we run the tests.
+
+Here is an overview of the changes that the diffs make to Iceberg:
+
+- Configure Comet as a dependency and set the correct version in
`libs.versions.toml` and `build.gradle`
+- Delete upstream Comet reader classes that reference legacy Comet APIs
removed in [#3739]. These classes were
+ added upstream in [apache/iceberg#15674] and depend on Comet's old Iceberg
Java integration. Since Comet now
+ uses a native Iceberg scan, these classes fail to compile and must be
removed.
+- Configure test base classes (`TestBase`, `ExtensionsTestBase`,
`ScanTestBase`, etc.) to load the Comet Spark
+ plugin and shuffle manager
+
+[#3739]: https://github.com/apache/datafusion-comet/pull/3739
+[apache/iceberg#15674]: https://github.com/apache/iceberg/pull/15674
+
+## 1. Install Comet
+
+Run `make release` in Comet to install the Comet JAR into the local Maven
repository, specifying the Spark version.
+
+```shell
+PROFILES="-Pspark-3.5" make release
+```
+
+## 2. Clone Iceberg and Apply Diff
+
+Clone Apache Iceberg locally and apply the diff file from Comet against the
matching tag.
+
+```shell
+git clone [email protected]:apache/iceberg.git apache-iceberg
+cd apache-iceberg
+git checkout apache-iceberg-1.8.1
+git apply ../datafusion-comet/dev/diffs/iceberg-rust/1.8.1.diff
+```
+
+## 3. Run Iceberg Spark Tests
+
+```shell
+ENABLE_COMET=true ./gradlew -DsparkVersions=3.5 -DscalaVersion=2.13
-DflinkVersions= -DkafkaVersions= \
+ :iceberg-spark:iceberg-spark-3.5_2.13:test \
+ -Pquick=true -x javadoc
+```
+
+The three Gradle targets tested in CI are:
+
+- `:iceberg-spark:iceberg-spark-<sparkVersion>_<scalaVersion>:test`
+- `:iceberg-spark:iceberg-spark-extensions-<sparkVersion>_<scalaVersion>:test`
+-
`:iceberg-spark:iceberg-spark-runtime-<sparkVersion>_<scalaVersion>:integrationTest`
+
+## Updating Diffs
+
+To update a diff (e.g. after modifying test configuration), apply the existing
diff, make changes, then
+regenerate:
+
+```shell
+cd apache-iceberg
+git reset --hard apache-iceberg-1.8.1 && git clean -fd
+git apply ../datafusion-comet/dev/diffs/iceberg-rust/1.8.1.diff
+
+# Make changes, then run spotless to fix formatting
+./gradlew spotlessApply
+
+# Stage any new or deleted files, then generate the diff
+git add -A
+git diff apache-iceberg-1.8.1 >
../datafusion-comet/dev/diffs/iceberg-rust/1.8.1.diff
+```
+
+Repeat for each Iceberg version (1.8.1, 1.9.1, 1.10.0). The file contents
differ between versions, so each
+diff must be generated against its own tag.
+
+## Running Tests in CI
+
+The `iceberg_spark_test.yml` workflow applies these diffs and runs the three
Gradle targets above against
+each Iceberg version. The test matrix covers Spark 3.4 and 3.5 across Iceberg
1.8.1, 1.9.1, and 1.10.0
+with Java 11 and 17. The workflow only runs when the PR title contains
`[iceberg]`.
diff --git a/_sources/user-guide/latest/compatibility.md.txt
b/_sources/user-guide/latest/compatibility.md.txt
index a10146234..c8edd7cd8 100644
--- a/_sources/user-guide/latest/compatibility.md.txt
+++ b/_sources/user-guide/latest/compatibility.md.txt
@@ -245,7 +245,7 @@ or strings containing null bytes (e.g \\u0000)
- **string -> date**: Only supports years between 262143 BC and 262142 AD
- **string -> decimal**: Does not support fullwidth unicode digits (e.g
\\uFF10)
or strings containing null bytes (e.g \\u0000)
-- **string -> timestamp**: ANSI mode not supported
+- **string -> timestamp**: Not all valid formats are supported
<!-- prettier-ignore-end -->
<!--END:CAST_ANSI_TABLE-->
diff --git a/_sources/user-guide/latest/configs.md.txt
b/_sources/user-guide/latest/configs.md.txt
index 810efe62f..822e68a13 100644
--- a/_sources/user-guide/latest/configs.md.txt
+++ b/_sources/user-guide/latest/configs.md.txt
@@ -102,6 +102,7 @@ These settings can be used to determine which parts of the
plan are accelerated
| `spark.comet.columnar.shuffle.batch.size` | Batch size when writing out
sorted spill files on the native side. Note that this should not be larger than
batch size (i.e., `spark.comet.batchSize`). Otherwise it will produce larger
batches than expected in the native operator after shuffle. | 8192 |
| `spark.comet.exec.shuffle.compression.codec` | The codec of Comet native
shuffle used to compress shuffle data. lz4, zstd, and snappy are supported.
Compression can be disabled by setting spark.shuffle.compress=false. | lz4 |
| `spark.comet.exec.shuffle.compression.zstd.level` | The compression level to
use when compressing shuffle files with zstd. | 1 |
+| `spark.comet.exec.shuffle.directRead.enabled` | When enabled, native
operators that consume shuffle output will read compressed shuffle blocks
directly in native code, bypassing Arrow FFI. Applies to both native shuffle
and JVM columnar shuffle. Requires spark.comet.exec.shuffle.enabled to be true.
| true |
| `spark.comet.exec.shuffle.enabled` | Whether to enable Comet native shuffle.
Note that this requires setting `spark.shuffle.manager` to
`org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager`.
`spark.shuffle.manager` must be set before starting the Spark application and
cannot be changed during the application. | true |
| `spark.comet.exec.shuffle.writeBufferSize` | Size of the write buffer in
bytes used by the native shuffle writer when writing shuffle data to disk.
Larger values may improve write performance by reducing the number of system
calls, but will use more memory. The default is 1MB which provides a good
balance between performance and memory usage. | 1048576b |
| `spark.comet.native.shuffle.partitioning.hash.enabled` | Whether to enable
hash partitioning for Comet native shuffle. | true |
diff --git a/contributor-guide/iceberg-spark-tests.html
b/contributor-guide/iceberg-spark-tests.html
new file mode 100644
index 000000000..da65f546b
--- /dev/null
+++ b/contributor-guide/iceberg-spark-tests.html
@@ -0,0 +1,550 @@
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+
+<!DOCTYPE html>
+
+
+<html lang="en" data-content_root="../" data-theme="light">
+
+ <head>
+ <meta charset="utf-8" />
+ <meta name="viewport" content="width=device-width, initial-scale=1.0"
/><meta name="viewport" content="width=device-width, initial-scale=1" />
+
+ <title>Running Iceberg Spark Tests — Apache DataFusion Comet
documentation</title>
+
+
+
+ <script data-cfasync="false">
+ document.documentElement.dataset.mode = localStorage.getItem("mode") ||
"light";
+ document.documentElement.dataset.theme = localStorage.getItem("theme") ||
"light";
+ </script>
+ <!--
+ this give us a css class that will be invisible only if js is disabled
+ -->
+ <noscript>
+ <style>
+ .pst-js-only { display: none !important; }
+
+ </style>
+ </noscript>
+
+ <!-- Loaded before other Sphinx assets -->
+ <link href="../_static/styles/theme.css?digest=8878045cc6db502f8baf"
rel="stylesheet" />
+<link
href="../_static/styles/pydata-sphinx-theme.css?digest=8878045cc6db502f8baf"
rel="stylesheet" />
+
+ <link rel="stylesheet" type="text/css"
href="../_static/pygments.css?v=8f2a1f02" />
+ <link rel="stylesheet" type="text/css"
href="../_static/theme_overrides.css?v=cd442bcd" />
+
+ <!-- So that users can add custom icons -->
+ <script
src="../_static/scripts/fontawesome.js?digest=8878045cc6db502f8baf"></script>
+ <!-- Pre-loaded scripts that we'll load fully later -->
+ <link rel="preload" as="script"
href="../_static/scripts/bootstrap.js?digest=8878045cc6db502f8baf" />
+<link rel="preload" as="script"
href="../_static/scripts/pydata-sphinx-theme.js?digest=8878045cc6db502f8baf" />
+
+ <script src="../_static/documentation_options.js?v=5929fcd5"></script>
+ <script src="../_static/doctools.js?v=9a2dae69"></script>
+ <script src="../_static/sphinx_highlight.js?v=dc90522c"></script>
+ <script>DOCUMENTATION_OPTIONS.pagename =
'contributor-guide/iceberg-spark-tests';</script>
+ <script async="true" defer="true"
src="https://buttons.github.io/buttons.js"></script>
+ <link rel="index" title="Index" href="../genindex.html" />
+ <link rel="search" title="Search" href="../search.html" />
+ <meta name="viewport" content="width=device-width, initial-scale=1"/>
+ <meta name="docsearch:language" content="en"/>
+ <meta name="docsearch:version" content="" />
+ </head>
+
+
+ <body data-bs-spy="scroll" data-bs-target=".bd-toc-nav" data-offset="180"
data-bs-root-margin="0px 0px -60%" data-default-mode="light">
+
+
+
+ <div id="pst-skip-link" class="skip-link d-print-none"><a
href="#main-content">Skip to main content</a></div>
+
+ <div id="pst-scroll-pixel-helper"></div>
+
+ <button type="button" class="btn rounded-pill" id="pst-back-to-top">
+ <i class="fa-solid fa-arrow-up"></i>Back to top</button>
+
+
+ <dialog id="pst-search-dialog">
+
+<form class="bd-search d-flex align-items-center"
+ action="../search.html"
+ method="get">
+ <i class="fa-solid fa-magnifying-glass"></i>
+ <input type="search"
+ class="form-control"
+ name="q"
+ placeholder="Search the docs ..."
+ aria-label="Search the docs ..."
+ autocomplete="off"
+ autocorrect="off"
+ autocapitalize="off"
+ spellcheck="false"/>
+ <span class="search-button__kbd-shortcut"><kbd
class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd>K</kbd></span>
+</form>
+ </dialog>
+
+ <div class="pst-async-banner-revealer d-none">
+ <aside id="bd-header-version-warning" class="d-none d-print-none"
aria-label="Version warning"></aside>
+</div>
+
+
+ <header class="bd-header navbar navbar-expand-lg bd-navbar d-print-none">
+<div class="bd-header__inner bd-page-width">
+ <button class="pst-navbar-icon sidebar-toggle primary-toggle"
aria-label="Site navigation">
+ <span class="fa-solid fa-bars"></span>
+ </button>
+
+
+ <div class="col-lg-3 navbar-header-items__start">
+
+ <div class="navbar-item">
+
+
+
+
+
+<a class="navbar-brand logo" href="../index.html">
+
+
+
+
+
+
+
+
+ <img src="../_static/DataFusionComet-Logo-Light.png" class="logo__image
only-light" alt="Apache DataFusion Comet documentation - Home"/>
+ <img src="../_static/DataFusionComet-Logo-Dark.png" class="logo__image
only-dark pst-js-only" alt="Apache DataFusion Comet documentation - Home"/>
+
+
+</a></div>
+
+ </div>
+
+ <div class="col-lg-9 navbar-header-items">
+
+ <div class="me-auto navbar-header-items__center">
+
+ <div class="navbar-item">
+<nav>
+ <ul class="bd-navbar-elements navbar-nav">
+
+<li class="nav-item ">
+ <a class="nav-link nav-internal" href="../about/index.html">
+ Comet Overview
+ </a>
+</li>
+
+
+<li class="nav-item ">
+ <a class="nav-link nav-internal" href="../user-guide/index.html">
+ User Guide
+ </a>
+</li>
+
+
+<li class="nav-item ">
+ <a class="nav-link nav-internal" href="index.html">
+ Contributor Guide
+ </a>
+</li>
+
+
+<li class="nav-item ">
+ <a class="nav-link nav-internal" href="../asf/index.html">
+ ASF Links
+ </a>
+</li>
+
+ </ul>
+</nav></div>
+
+ </div>
+
+
+ <div class="navbar-header-items__end">
+
+ <div class="navbar-item navbar-persistent--container">
+
+
+<button class="btn search-button-field search-button__button pst-js-only"
title="Search" aria-label="Search" data-bs-placement="bottom"
data-bs-toggle="tooltip">
+ <i class="fa-solid fa-magnifying-glass"></i>
+ <span class="search-button__default-text">Search</span>
+ <span class="search-button__kbd-shortcut"><kbd
class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd
class="kbd-shortcut__modifier">K</kbd></span>
+</button>
+ </div>
+
+
+ <div class="navbar-item"><ul class="navbar-icon-links"
+ aria-label="Icon Links">
+ <li class="nav-item">
+
+
+
+
+
+
+
+
+ <a href="https://github.com/apache/datafusion-comet" title="GitHub"
class="nav-link pst-navbar-icon" rel="noopener" target="_blank"
data-bs-toggle="tooltip" data-bs-placement="bottom"><i class="fa-brands
fa-github fa-lg" aria-hidden="true"></i>
+ <span class="sr-only">GitHub</span></a>
+ </li>
+</ul></div>
+
+ <div class="navbar-item">
+
+<button class="btn btn-sm nav-link pst-navbar-icon theme-switch-button
pst-js-only" aria-label="Color mode" data-bs-title="Color mode"
data-bs-placement="bottom" data-bs-toggle="tooltip">
+ <i class="theme-switch fa-solid fa-sun fa-lg"
data-mode="light" title="Light"></i>
+ <i class="theme-switch fa-solid fa-moon fa-lg"
data-mode="dark" title="Dark"></i>
+ <i class="theme-switch fa-solid fa-circle-half-stroke fa-lg"
data-mode="auto" title="System Settings"></i>
+</button></div>
+
+ </div>
+
+ </div>
+
+
+ <div class="navbar-persistent--mobile">
+
+<button class="btn search-button-field search-button__button pst-js-only"
title="Search" aria-label="Search" data-bs-placement="bottom"
data-bs-toggle="tooltip">
+ <i class="fa-solid fa-magnifying-glass"></i>
+ <span class="search-button__default-text">Search</span>
+ <span class="search-button__kbd-shortcut"><kbd
class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd
class="kbd-shortcut__modifier">K</kbd></span>
+</button>
+ </div>
+
+
+
+</div>
+
+ </header>
+
+
+ <div class="bd-container">
+ <div class="bd-container__inner bd-page-width">
+
+
+
+
+
+ <dialog id="pst-primary-sidebar-modal"></dialog>
+ <div id="pst-primary-sidebar" class="bd-sidebar-primary bd-sidebar">
+
+
+
+ <div class="sidebar-header-items sidebar-primary__section">
+
+
+ <div class="sidebar-header-items__center">
+
+
+
+ <div class="navbar-item">
+<nav>
+ <ul class="bd-navbar-elements navbar-nav">
+
+<li class="nav-item ">
+ <a class="nav-link nav-internal" href="../about/index.html">
+ Comet Overview
+ </a>
+</li>
+
+
+<li class="nav-item ">
+ <a class="nav-link nav-internal" href="../user-guide/index.html">
+ User Guide
+ </a>
+</li>
+
+
+<li class="nav-item ">
+ <a class="nav-link nav-internal" href="index.html">
+ Contributor Guide
+ </a>
+</li>
+
+
+<li class="nav-item ">
+ <a class="nav-link nav-internal" href="../asf/index.html">
+ ASF Links
+ </a>
+</li>
+
+ </ul>
+</nav></div>
+
+
+ </div>
+
+
+
+ <div class="sidebar-header-items__end">
+
+ <div class="navbar-item"><ul class="navbar-icon-links"
+ aria-label="Icon Links">
+ <li class="nav-item">
+
+
+
+
+
+
+
+
+ <a href="https://github.com/apache/datafusion-comet" title="GitHub"
class="nav-link pst-navbar-icon" rel="noopener" target="_blank"
data-bs-toggle="tooltip" data-bs-placement="bottom"><i class="fa-brands
fa-github fa-lg" aria-hidden="true"></i>
+ <span class="sr-only">GitHub</span></a>
+ </li>
+</ul></div>
+
+ <div class="navbar-item">
+
+<button class="btn btn-sm nav-link pst-navbar-icon theme-switch-button
pst-js-only" aria-label="Color mode" data-bs-title="Color mode"
data-bs-placement="bottom" data-bs-toggle="tooltip">
+ <i class="theme-switch fa-solid fa-sun fa-lg"
data-mode="light" title="Light"></i>
+ <i class="theme-switch fa-solid fa-moon fa-lg"
data-mode="dark" title="Dark"></i>
+ <i class="theme-switch fa-solid fa-circle-half-stroke fa-lg"
data-mode="auto" title="System Settings"></i>
+</button></div>
+
+ </div>
+
+ </div>
+
+ <div class="sidebar-primary-items__start sidebar-primary__section">
+ <div class="sidebar-primary-item"><!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+<nav class="bd-links" id="bd-docs-nav" aria-label="Main navigation">
+ <div class="bd-toc-item active">
+ <p aria-level="2" class="caption" role="heading"><span
class="caption-text">Index</span></p>
+<ul>
+<li class="toctree-l1"><a class="reference internal"
href="../about/index.html">Comet Overview</a></li>
+<li class="toctree-l1"><a class="reference internal"
href="../user-guide/index.html">User Guide</a></li>
+<li class="toctree-l1"><a class="reference internal"
href="index.html">Contributor Guide</a></li>
+<li class="toctree-l1"><a class="reference internal"
href="../asf/index.html">ASF Links</a></li>
+</ul>
+
+ </div>
+</nav>
+</div>
+ </div>
+
+
+ <div class="sidebar-primary-items__end sidebar-primary__section">
+ <div class="sidebar-primary-item">
+<div id="ethical-ad-placement"
+ class="flat"
+ data-ea-publisher="readthedocs"
+ data-ea-type="readthedocs-sidebar"
+ data-ea-manual="true">
+</div></div>
+ </div>
+
+
+ </div>
+
+ <main id="main-content" class="bd-main" role="main">
+
+
+ <div class="bd-content">
+ <div class="bd-article-container">
+
+ <div class="bd-header-article d-print-none">
+<div class="header-article-items header-article__inner">
+
+ <div class="header-article-items__start">
+
+ <div class="header-article-item">
+
+<nav aria-label="Breadcrumb" class="d-print-none">
+ <ul class="bd-breadcrumbs">
+
+ <li class="breadcrumb-item breadcrumb-home">
+ <a href="../index.html" class="nav-link" aria-label="Home">
+ <i class="fa-solid fa-home"></i>
+ </a>
+ </li>
+ <li class="breadcrumb-item active" aria-current="page"><span
class="ellipsis">Running Iceberg Spark Tests</span></li>
+ </ul>
+</nav>
+</div>
+
+ </div>
+
+
+</div>
+</div>
+
+
+
+
+<div id="searchbox"></div>
+ <article class="bd-article">
+
+ <!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+<section id="running-iceberg-spark-tests">
+<h1>Running Iceberg Spark Tests<a class="headerlink"
href="#running-iceberg-spark-tests" title="Link to this heading">#</a></h1>
+<p>Running Apache Iceberg’s Spark tests with Comet enabled is a good way to
ensure that Comet produces the same
+results as Spark when reading Iceberg tables. To enable this, we apply diff
files to the Apache Iceberg source
+code so that Comet is loaded when we run the tests.</p>
+<p>Here is an overview of the changes that the diffs make to Iceberg:</p>
+<ul class="simple">
+<li><p>Configure Comet as a dependency and set the correct version in <code
class="docutils literal notranslate"><span
class="pre">libs.versions.toml</span></code> and <code class="docutils literal
notranslate"><span class="pre">build.gradle</span></code></p></li>
+<li><p>Delete upstream Comet reader classes that reference legacy Comet APIs
removed in <a class="reference external"
href="https://github.com/apache/datafusion-comet/pull/3739">#3739</a>. These
classes were
+added upstream in <a class="reference external"
href="https://github.com/apache/iceberg/pull/15674">apache/iceberg#15674</a>
and depend on Comet’s old Iceberg Java integration. Since Comet now
+uses a native Iceberg scan, these classes fail to compile and must be
removed.</p></li>
+<li><p>Configure test base classes (<code class="docutils literal
notranslate"><span class="pre">TestBase</span></code>, <code class="docutils
literal notranslate"><span class="pre">ExtensionsTestBase</span></code>, <code
class="docutils literal notranslate"><span
class="pre">ScanTestBase</span></code>, etc.) to load the Comet Spark
+plugin and shuffle manager</p></li>
+</ul>
+<section id="install-comet">
+<h2>1. Install Comet<a class="headerlink" href="#install-comet" title="Link to
this heading">#</a></h2>
+<p>Run <code class="docutils literal notranslate"><span
class="pre">make</span> <span class="pre">release</span></code> in Comet to
install the Comet JAR into the local Maven repository, specifying the Spark
version.</p>
+<div class="highlight-shell notranslate"><div
class="highlight"><pre><span></span><span class="nv">PROFILES</span><span
class="o">=</span><span class="s2">"-Pspark-3.5"</span><span
class="w"> </span>make<span class="w"> </span>release
+</pre></div>
+</div>
+</section>
+<section id="clone-iceberg-and-apply-diff">
+<h2>2. Clone Iceberg and Apply Diff<a class="headerlink"
href="#clone-iceberg-and-apply-diff" title="Link to this heading">#</a></h2>
+<p>Clone Apache Iceberg locally and apply the diff file from Comet against the
matching tag.</p>
+<div class="highlight-shell notranslate"><div
class="highlight"><pre><span></span>git<span class="w"> </span>clone<span
class="w"> </span>[email protected]:apache/iceberg.git<span class="w">
</span>apache-iceberg
+<span class="nb">cd</span><span class="w"> </span>apache-iceberg
+git<span class="w"> </span>checkout<span class="w"> </span>apache-iceberg-1.8.1
+git<span class="w"> </span>apply<span class="w">
</span>../datafusion-comet/dev/diffs/iceberg-rust/1.8.1.diff
+</pre></div>
+</div>
+</section>
+<section id="run-iceberg-spark-tests">
+<h2>3. Run Iceberg Spark Tests<a class="headerlink"
href="#run-iceberg-spark-tests" title="Link to this heading">#</a></h2>
+<div class="highlight-shell notranslate"><div
class="highlight"><pre><span></span><span class="nv">ENABLE_COMET</span><span
class="o">=</span><span class="nb">true</span><span class="w">
</span>./gradlew<span class="w"> </span>-DsparkVersions<span
class="o">=</span><span class="m">3</span>.5<span class="w">
</span>-DscalaVersion<span class="o">=</span><span class="m">2</span>.13<span
class="w"> </span>-DflinkVersions<span class="o">=</span><span class="w">
</span>-DkafkaVersions<span cla [...]
+<span class="w"> </span>:iceberg-spark:iceberg-spark-3.5_2.13:test<span
class="w"> </span><span class="se">\</span>
+<span class="w"> </span>-Pquick<span class="o">=</span><span
class="nb">true</span><span class="w"> </span>-x<span class="w"> </span>javadoc
+</pre></div>
+</div>
+<p>The three Gradle targets tested in CI are:</p>
+<ul class="simple">
+<li><p><code class="docutils literal notranslate"><span
class="pre">:iceberg-spark:iceberg-spark-<sparkVersion>_<scalaVersion>:test</span></code></p></li>
+<li><p><code class="docutils literal notranslate"><span
class="pre">:iceberg-spark:iceberg-spark-extensions-<sparkVersion>_<scalaVersion>:test</span></code></p></li>
+<li><p><code class="docutils literal notranslate"><span
class="pre">:iceberg-spark:iceberg-spark-runtime-<sparkVersion>_<scalaVersion>:integrationTest</span></code></p></li>
+</ul>
+</section>
+<section id="updating-diffs">
+<h2>Updating Diffs<a class="headerlink" href="#updating-diffs" title="Link to
this heading">#</a></h2>
+<p>To update a diff (e.g. after modifying test configuration), apply the
existing diff, make changes, then
+regenerate:</p>
+<div class="highlight-shell notranslate"><div
class="highlight"><pre><span></span><span class="nb">cd</span><span class="w">
</span>apache-iceberg
+git<span class="w"> </span>reset<span class="w"> </span>--hard<span class="w">
</span>apache-iceberg-1.8.1<span class="w"> </span><span
class="o">&&</span><span class="w"> </span>git<span class="w">
</span>clean<span class="w"> </span>-fd
+git<span class="w"> </span>apply<span class="w">
</span>../datafusion-comet/dev/diffs/iceberg-rust/1.8.1.diff
+
+<span class="c1"># Make changes, then run spotless to fix formatting</span>
+./gradlew<span class="w"> </span>spotlessApply
+
+<span class="c1"># Stage any new or deleted files, then generate the
diff</span>
+git<span class="w"> </span>add<span class="w"> </span>-A
+git<span class="w"> </span>diff<span class="w">
</span>apache-iceberg-1.8.1<span class="w"> </span>><span class="w">
</span>../datafusion-comet/dev/diffs/iceberg-rust/1.8.1.diff
+</pre></div>
+</div>
+<p>Repeat for each Iceberg version (1.8.1, 1.9.1, 1.10.0). The file contents
differ between versions, so each
+diff must be generated against its own tag.</p>
+</section>
+<section id="running-tests-in-ci">
+<h2>Running Tests in CI<a class="headerlink" href="#running-tests-in-ci"
title="Link to this heading">#</a></h2>
+<p>The <code class="docutils literal notranslate"><span
class="pre">iceberg_spark_test.yml</span></code> workflow applies these diffs
and runs the three Gradle targets above against
+each Iceberg version. The test matrix covers Spark 3.4 and 3.5 across Iceberg
1.8.1, 1.9.1, and 1.10.0
+with Java 11 and 17. The workflow only runs when the PR title contains <code
class="docutils literal notranslate"><span
class="pre">[iceberg]</span></code>.</p>
+</section>
+</section>
+
+
+ </article>
+
+
+
+
+
+ <footer class="prev-next-footer d-print-none">
+
+<div class="prev-next-area">
+</div>
+ </footer>
+
+ </div>
+
+
+
+
+ </div>
+ <footer class="bd-footer-content">
+
+ </footer>
+
+ </main>
+ </div>
+ </div>
+
+ <!-- Scripts loaded after <body> so the DOM is not blocked -->
+ <script defer
src="../_static/scripts/bootstrap.js?digest=8878045cc6db502f8baf"></script>
+<script defer
src="../_static/scripts/pydata-sphinx-theme.js?digest=8878045cc6db502f8baf"></script>
+
+<!-- Based on pydata_sphinx_theme/footer.html -->
+<footer class="footer mt-5 mt-md-0">
+ <div class="container">
+
+ <div class="footer-item">
+ <p>Apache DataFusion, Apache DataFusion Comet, Apache, the Apache
feather logo, and the Apache DataFusion project logo</p>
+ <p>are either registered trademarks or trademarks of The Apache Software
Foundation in the United States and other countries.</p>
+ </div>
+ </div>
+</footer>
+
+
+ </body>
+</html>
\ No newline at end of file
diff --git a/objects.inv b/objects.inv
index c497802a5..a1b819115 100644
Binary files a/objects.inv and b/objects.inv differ
diff --git a/searchindex.js b/searchindex.js
index f2cd344de..9a0e11862 100644
--- a/searchindex.js
+++ b/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Format Your Code": [[12,
"format-your-code"]], "1. Install Comet": [[22, "install-comet"]], "1. Native
Operators (nativeExecs map)": [[4, "native-operators-nativeexecs-map"]], "2.
Build and Verify": [[12, "build-and-verify"]], "2. Clone Spark and Apply Diff":
[[22, "clone-spark-and-apply-diff"]], "2. Sink Operators (sinks map)": [[4,
"sink-operators-sinks-map"]], "3. Comet JVM Operators": [[4,
"comet-jvm-operators"]], "3. Run Clippy (Recommended)": [[12 [...]
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Format Your Code": [[12,
"format-your-code"]], "1. Install Comet": [[14, "install-comet"], [23,
"install-comet"]], "1. Native Operators (nativeExecs map)": [[4,
"native-operators-nativeexecs-map"]], "2. Build and Verify": [[12,
"build-and-verify"]], "2. Clone Iceberg and Apply Diff": [[14,
"clone-iceberg-and-apply-diff"]], "2. Clone Spark and Apply Diff": [[23,
"clone-spark-and-apply-diff"]], "2. Sink Operators (sinks map)": [[4,
"sink-operators-sinks-m [...]
\ No newline at end of file
diff --git a/user-guide/latest/compatibility.html
b/user-guide/latest/compatibility.html
index 2d0176278..0e16c7a39 100644
--- a/user-guide/latest/compatibility.html
+++ b/user-guide/latest/compatibility.html
@@ -1218,7 +1218,7 @@ or strings containing null bytes (e.g \u0000)</p></li>
<li><p><strong>string -> date</strong>: Only supports years between 262143
BC and 262142 AD</p></li>
<li><p><strong>string -> decimal</strong>: Does not support fullwidth
unicode digits (e.g \uFF10)
or strings containing null bytes (e.g \u0000)</p></li>
-<li><p><strong>string -> timestamp</strong>: ANSI mode not
supported</p></li>
+<li><p><strong>string -> timestamp</strong>: Not all valid formats are
supported</p></li>
</ul>
<!-- prettier-ignore-end -->
<!--END:CAST_ANSI_TABLE-->
diff --git a/user-guide/latest/configs.html b/user-guide/latest/configs.html
index f77717aa7..82b24c214 100644
--- a/user-guide/latest/configs.html
+++ b/user-guide/latest/configs.html
@@ -694,35 +694,39 @@ under the License.
<td><p>The compression level to use when compressing shuffle files with
zstd.</p></td>
<td><p>1</p></td>
</tr>
-<tr class="row-even"><td><p><code class="docutils literal notranslate"><span
class="pre">spark.comet.exec.shuffle.enabled</span></code></p></td>
+<tr class="row-even"><td><p><code class="docutils literal notranslate"><span
class="pre">spark.comet.exec.shuffle.directRead.enabled</span></code></p></td>
+<td><p>When enabled, native operators that consume shuffle output will read
compressed shuffle blocks directly in native code, bypassing Arrow FFI. Applies
to both native shuffle and JVM columnar shuffle. Requires
spark.comet.exec.shuffle.enabled to be true.</p></td>
+<td><p>true</p></td>
+</tr>
+<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span
class="pre">spark.comet.exec.shuffle.enabled</span></code></p></td>
<td><p>Whether to enable Comet native shuffle. Note that this requires setting
<code class="docutils literal notranslate"><span
class="pre">spark.shuffle.manager</span></code> to <code class="docutils
literal notranslate"><span
class="pre">org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager</span></code>.
<code class="docutils literal notranslate"><span
class="pre">spark.shuffle.manager</span></code> must be set before starting the
Spark application and cannot be changed dur [...]
<td><p>true</p></td>
</tr>
-<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span
class="pre">spark.comet.exec.shuffle.writeBufferSize</span></code></p></td>
+<tr class="row-even"><td><p><code class="docutils literal notranslate"><span
class="pre">spark.comet.exec.shuffle.writeBufferSize</span></code></p></td>
<td><p>Size of the write buffer in bytes used by the native shuffle writer
when writing shuffle data to disk. Larger values may improve write performance
by reducing the number of system calls, but will use more memory. The default
is 1MB which provides a good balance between performance and memory
usage.</p></td>
<td><p>1048576b</p></td>
</tr>
-<tr class="row-even"><td><p><code class="docutils literal notranslate"><span
class="pre">spark.comet.native.shuffle.partitioning.hash.enabled</span></code></p></td>
+<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span
class="pre">spark.comet.native.shuffle.partitioning.hash.enabled</span></code></p></td>
<td><p>Whether to enable hash partitioning for Comet native shuffle.</p></td>
<td><p>true</p></td>
</tr>
-<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span
class="pre">spark.comet.native.shuffle.partitioning.range.enabled</span></code></p></td>
+<tr class="row-even"><td><p><code class="docutils literal notranslate"><span
class="pre">spark.comet.native.shuffle.partitioning.range.enabled</span></code></p></td>
<td><p>Whether to enable range partitioning for Comet native shuffle.</p></td>
<td><p>true</p></td>
</tr>
-<tr class="row-even"><td><p><code class="docutils literal notranslate"><span
class="pre">spark.comet.native.shuffle.partitioning.roundrobin.enabled</span></code></p></td>
+<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span
class="pre">spark.comet.native.shuffle.partitioning.roundrobin.enabled</span></code></p></td>
<td><p>Whether to enable round robin partitioning for Comet native shuffle.
This is disabled by default because Comet’s round-robin produces different
partition assignments than Spark. Spark sorts rows by their binary UnsafeRow
representation before assigning partitions, but Comet uses Arrow format which
has a different binary layout. Instead, Comet implements round-robin as hash
partitioning on all columns, which achieves the same goals: even distribution,
deterministic output (for faul [...]
<td><p>false</p></td>
</tr>
-<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span
class="pre">spark.comet.native.shuffle.partitioning.roundrobin.maxHashColumns</span></code></p></td>
+<tr class="row-even"><td><p><code class="docutils literal notranslate"><span
class="pre">spark.comet.native.shuffle.partitioning.roundrobin.maxHashColumns</span></code></p></td>
<td><p>The maximum number of columns to hash for round robin partitioning.
When set to 0 (the default), all columns are hashed. When set to a positive
value, only the first N columns are used for hashing, which can improve
performance for wide tables while still providing reasonable
distribution.</p></td>
<td><p>0</p></td>
</tr>
-<tr class="row-even"><td><p><code class="docutils literal notranslate"><span
class="pre">spark.comet.shuffle.preferDictionary.ratio</span></code></p></td>
+<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span
class="pre">spark.comet.shuffle.preferDictionary.ratio</span></code></p></td>
<td><p>The ratio of total values to distinct values in a string column to
decide whether to prefer dictionary encoding when shuffling the column. If the
ratio is higher than this config, dictionary encoding will be used on shuffling
string column. This config is effective if it is higher than 1.0. Note that
this config is only used when <code class="docutils literal notranslate"><span
class="pre">spark.comet.exec.shuffle.mode</span></code> is <code
class="docutils literal notranslate"><s [...]
<td><p>10.0</p></td>
</tr>
-<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span
class="pre">spark.comet.shuffle.sizeInBytesMultiplier</span></code></p></td>
+<tr class="row-even"><td><p><code class="docutils literal notranslate"><span
class="pre">spark.comet.shuffle.sizeInBytesMultiplier</span></code></p></td>
<td><p>Comet reports smaller sizes for shuffle due to using Arrow’s columnar
memory format and this can result in Spark choosing a different join strategy
due to the estimated size of the exchange being smaller. Comet will multiple
sizeInBytes by this amount to avoid regressions in join strategy.</p></td>
<td><p>1.0</p></td>
</tr>
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]