This is an automated email from the ASF dual-hosted git repository.
github-bot pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/datafusion-comet.git
The following commit(s) were added to refs/heads/asf-site by this push:
new 87c6ede63 Publish built docs triggered by
c112f473dbcea881e16e6057143118d54f4c7e3b
87c6ede63 is described below
commit 87c6ede63ed4e164c320e9cefd4f7aee60850582
Author: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
AuthorDate: Mon Jan 12 18:04:31 2026 +0000
Publish built docs triggered by c112f473dbcea881e16e6057143118d54f4c7e3b
---
_sources/contributor-guide/index.md.txt | 2 +
_sources/contributor-guide/jvm_shuffle.md.txt | 199 ++++++
_sources/contributor-guide/native_shuffle.md.txt | 243 +++++++
contributor-guide/adding_a_new_expression.html | 2 +
contributor-guide/adding_a_new_operator.html | 2 +
contributor-guide/benchmarking.html | 2 +
contributor-guide/contributing.html | 2 +
contributor-guide/debugging.html | 2 +
contributor-guide/development.html | 2 +
contributor-guide/ffi.html | 8 +-
contributor-guide/index.html | 26 +
contributor-guide/jvm_shuffle.html | 788 +++++++++++++++++++++
contributor-guide/native_shuffle.html | 866 +++++++++++++++++++++++
contributor-guide/parquet_scans.html | 8 +-
contributor-guide/plugin_overview.html | 2 +
contributor-guide/profiling_native_code.html | 2 +
contributor-guide/roadmap.html | 2 +
contributor-guide/spark-sql-tests.html | 2 +
contributor-guide/tracing.html | 2 +
objects.inv | Bin 1622 -> 1658 bytes
searchindex.js | 2 +-
21 files changed, 2157 insertions(+), 7 deletions(-)
diff --git a/_sources/contributor-guide/index.md.txt
b/_sources/contributor-guide/index.md.txt
index 7b0385094..db3270b6a 100644
--- a/_sources/contributor-guide/index.md.txt
+++ b/_sources/contributor-guide/index.md.txt
@@ -26,6 +26,8 @@ under the License.
Getting Started <contributing>
Comet Plugin Overview <plugin_overview>
Arrow FFI <ffi>
+JVM Shuffle <jvm_shuffle>
+Native Shuffle <native_shuffle>
Parquet Scans <parquet_scans>
Development Guide <development>
Debugging Guide <debugging>
diff --git a/_sources/contributor-guide/jvm_shuffle.md.txt
b/_sources/contributor-guide/jvm_shuffle.md.txt
new file mode 100644
index 000000000..e011651d2
--- /dev/null
+++ b/_sources/contributor-guide/jvm_shuffle.md.txt
@@ -0,0 +1,199 @@
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# JVM Shuffle
+
+This document describes Comet's JVM-based columnar shuffle implementation
(`CometColumnarShuffle`), which
+writes shuffle data in Arrow IPC format using JVM code with native encoding.
For the fully native
+alternative, see [Native Shuffle](native_shuffle.md).
+
+## Overview
+
+Comet provides two shuffle implementations:
+
+- **CometNativeShuffle** (`CometExchange`): Fully native shuffle using Rust.
Takes columnar input directly
+ from Comet native operators and performs partitioning in native code.
+- **CometColumnarShuffle** (`CometColumnarExchange`): JVM-based shuffle that
operates on rows internally,
+ buffers `UnsafeRow`s in memory pages, and uses native code (via JNI) to
encode them to Arrow IPC format.
+ Uses Spark's partitioner for partition assignment. Can accept either
row-based or columnar input
+ (columnar input is converted to rows via `ColumnarToRowExec`).
+
+The JVM shuffle is selected via `CometShuffleDependency.shuffleType`.
+
+## When JVM Shuffle is Used
+
+JVM shuffle (`CometColumnarExchange`) is used instead of native shuffle
(`CometExchange`) in the following cases:
+
+1. **Shuffle mode is explicitly set to "jvm"**: When
`spark.comet.exec.shuffle.mode` is set to `jvm`.
+
+2. **Child plan is not a Comet native operator**: When the child plan is a
Spark row-based operator
+ (not a `CometPlan`), JVM shuffle is the only option since native shuffle
requires columnar input
+ from Comet operators.
+
+3. **Unsupported partitioning type**: Native shuffle only supports
`HashPartitioning`, `RangePartitioning`,
+ and `SinglePartition`. JVM shuffle additionally supports
`RoundRobinPartitioning`.
+
+4. **Unsupported partition key types**: For `HashPartitioning` and
`RangePartitioning`, native shuffle
+ only supports primitive types as partition keys. Complex types (struct,
array, map) cannot be used
+ as partition keys in native shuffle, though they are fully supported as
data columns in both implementations.
+
+## Input Handling
+
+### Spark Row-Based Input
+
+When the child plan is a Spark row-based operator, `CometColumnarExchange`
calls `child.execute()` which
+returns an `RDD[InternalRow]`. The rows flow directly to the JVM shuffle
writers.
+
+### Comet Columnar Input
+
+When the child plan is a Comet native operator (e.g., `CometHashAggregate`)
but JVM shuffle is selected
+(due to shuffle mode setting or unsupported partitioning),
`CometColumnarExchange` still calls
+`child.execute()`. Comet operators implement `doExecute()` by wrapping
themselves with `ColumnarToRowExec`:
+
+```scala
+// In CometExec base class
+override def doExecute(): RDD[InternalRow] =
+ ColumnarToRowExec(this).doExecute()
+```
+
+This means the data path becomes:
+
+```
+Comet Native (columnar) → ColumnarToRowExec → rows → JVM Shuffle → Arrow IPC →
columnar
+```
+
+This is less efficient than native shuffle which avoids the columnar-to-row
conversion:
+
+```
+Comet Native (columnar) → Native Shuffle → Arrow IPC → columnar
+```
+
+### Why Use Spark's Partitioner?
+
+JVM shuffle uses row-based input so it can leverage Spark's existing
partitioner infrastructure
+(`partitioner.getPartition(key)`). This allows Comet to support all of Spark's
partitioning schemes
+without reimplementing them in Rust. Native shuffle, by contrast, serializes
the partitioning scheme
+to protobuf and implements the partitioning logic in native code.
+
+## Architecture
+
+```
+┌─────────────────────────────────────────────────────────────────────────┐
+│ CometShuffleManager │
+│ - Extends Spark's ShuffleManager │
+│ - Routes to appropriate writer/reader based on ShuffleHandle type │
+└─────────────────────────────────────────────────────────────────────────┘
+ │
+ ┌────────────────────────┼────────────────────────┐
+ ▼ ▼ ▼
+┌─────────────────────┐ ┌─────────────────────┐ ┌─────────────────────┐
+│ CometBypassMerge- │ │ CometUnsafe- │ │ CometNative- │
+│ SortShuffleWriter │ │ ShuffleWriter │ │ ShuffleWriter │
+│ (hash-based) │ │ (sort-based) │ │ (fully native) │
+└─────────────────────┘ └─────────────────────┘ └─────────────────────┘
+ │ │
+ ▼ ▼
+┌─────────────────────┐ ┌─────────────────────┐
+│ CometDiskBlock- │ │ CometShuffleExternal│
+│ Writer │ │ Sorter │
+└─────────────────────┘ └─────────────────────┘
+ │ │
+ └────────────┬───────────┘
+ ▼
+ ┌─────────────────────┐
+ │ SpillWriter │
+ │ (native encoding │
+ │ via JNI) │
+ └─────────────────────┘
+```
+
+## Key Classes
+
+### Shuffle Manager
+
+| Class | Location |
Description
|
+| ------------------------ | ------------------------------------------ |
------------------------------------------------------------------------------------------------------------------------------------------------
|
+| `CometShuffleManager` | `.../shuffle/CometShuffleManager.scala` |
Entry point. Extends Spark's `ShuffleManager`. Selects writer/reader based on
handle type. Delegates non-Comet shuffles to `SortShuffleManager`. |
+| `CometShuffleDependency` | `.../shuffle/CometShuffleDependency.scala` |
Extends `ShuffleDependency`. Contains `shuffleType` (`CometColumnarShuffle` or
`CometNativeShuffle`) and schema info. |
+
+### Shuffle Handles
+
+| Handle | Writer Strategy
|
+| ----------------------------------- |
--------------------------------------------------------- |
+| `CometBypassMergeSortShuffleHandle` | Hash-based: one file per partition,
merged at end |
+| `CometSerializedShuffleHandle` | Sort-based: records sorted by
partition ID, single output |
+| `CometNativeShuffleHandle` | Fully native shuffle
|
+
+Selection logic in `CometShuffleManager.shouldBypassMergeSort()`:
+
+- Uses bypass if partitions < threshold AND partitions × cores ≤ max threads
+- Otherwise uses sort-based to avoid OOM from many concurrent writers
+
+### Writers
+
+| Class | Location
| Description
|
+| ----------------------------------- |
---------------------------------------------------- |
-------------------------------------------------------------------------------------------------------------------
|
+| `CometBypassMergeSortShuffleWriter` |
`.../shuffle/CometBypassMergeSortShuffleWriter.java` | Hash-based writer.
Creates one `CometDiskBlockWriter` per partition. Supports async writes.
|
+| `CometUnsafeShuffleWriter` |
`.../shuffle/CometUnsafeShuffleWriter.java` | Sort-based writer. Uses
`CometShuffleExternalSorter` to buffer and sort records, then merges spill
files. |
+| `CometDiskBlockWriter` |
`.../shuffle/CometDiskBlockWriter.java` | Buffers rows in memory
pages for a single partition. Spills to disk via native encoding. Used by
bypass writer. |
+| `CometShuffleExternalSorter` |
`.../shuffle/sort/CometShuffleExternalSorter.java` | Buffers records across
all partitions, sorts by partition ID, spills sorted data. Used by unsafe
writer. |
+| `SpillWriter` | `.../shuffle/SpillWriter.java`
| Base class for spill logic. Manages memory pages and calls
`Native.writeSortedFileNative()` for Arrow IPC encoding. |
+
+### Reader
+
+| Class | Location
| Description
|
+| ------------------------------ |
------------------------------------------------ |
--------------------------------------------------------------------------------------------------
|
+| `CometBlockStoreShuffleReader` |
`.../shuffle/CometBlockStoreShuffleReader.scala` | Fetches shuffle blocks via
`ShuffleBlockFetcherIterator`. Decodes Arrow IPC to `ColumnarBatch`. |
+| `NativeBatchDecoderIterator` |
`.../shuffle/NativeBatchDecoderIterator.scala` | Reads compressed Arrow IPC
batches from input stream. Calls `Native.decodeShuffleBlock()` via JNI. |
+
+## Data Flow
+
+### Write Path
+
+1. `ShuffleWriteProcessor` calls `CometShuffleManager.getWriter()`
+2. Writer receives `Iterator[Product2[K, V]]` where V is `UnsafeRow`
+3. Rows are serialized and buffered in off-heap memory pages
+4. When memory threshold or batch size is reached, `SpillWriter.doSpilling()`
is called
+5. Native code (`Native.writeSortedFileNative()`) converts rows to Arrow
arrays and writes IPC format
+6. For bypass writer: partition files are concatenated into final output
+7. For sort writer: spill files are merged
+
+### Read Path
+
+1. `CometBlockStoreShuffleReader.read()` creates `ShuffleBlockFetcherIterator`
+2. For each block, `NativeBatchDecoderIterator` reads the IPC stream
+3. Native code (`Native.decodeShuffleBlock()`) decompresses and decodes to
Arrow arrays
+4. Arrow FFI imports arrays as `ColumnarBatch`
+
+## Memory Management
+
+- `CometShuffleMemoryAllocator`: Custom allocator for off-heap memory pages
+- Memory is allocated in pages; when allocation fails, writers spill to disk
+- `CometDiskBlockWriter` coordinates spilling across all partition writers
(largest first)
+- Async spilling is supported via `ShuffleThreadPool`
+
+## Configuration
+
+| Config | Description
|
+| ----------------------------------------------- |
----------------------------------- |
+| `spark.comet.columnar.shuffle.async.enabled` | Enable async spill writes
|
+| `spark.comet.columnar.shuffle.async.thread.num` | Threads per writer for
async |
+| `spark.comet.columnar.shuffle.batch.size` | Rows per Arrow batch
|
+| `spark.comet.columnar.shuffle.spill.threshold` | Row count threshold for
spill |
+| `spark.comet.exec.shuffle.compression.codec` | Compression codec (zstd,
lz4, etc.) |
diff --git a/_sources/contributor-guide/native_shuffle.md.txt
b/_sources/contributor-guide/native_shuffle.md.txt
new file mode 100644
index 000000000..e3d2dea47
--- /dev/null
+++ b/_sources/contributor-guide/native_shuffle.md.txt
@@ -0,0 +1,243 @@
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Native Shuffle
+
+This document describes Comet's native shuffle implementation
(`CometNativeShuffle`), which performs
+shuffle operations entirely in Rust code for maximum performance. For the
JVM-based alternative,
+see [JVM Shuffle](jvm_shuffle.md).
+
+## Overview
+
+Native shuffle takes columnar input directly from Comet native operators and
performs partitioning,
+encoding, and writing in native Rust code. This avoids the
columnar-to-row-to-columnar conversion
+overhead that JVM shuffle incurs.
+
+```
+Comet Native (columnar) → Native Shuffle → Arrow IPC → columnar
+```
+
+Compare this to JVM shuffle's data path:
+
+```
+Comet Native (columnar) → ColumnarToRowExec → rows → JVM Shuffle → Arrow IPC →
columnar
+```
+
+## When Native Shuffle is Used
+
+Native shuffle (`CometExchange`) is selected when all of the following
conditions are met:
+
+1. **Shuffle mode allows native**: `spark.comet.exec.shuffle.mode` is `native`
or `auto`.
+
+2. **Child plan is a Comet native operator**: The child must be a `CometPlan`
that produces
+ columnar output. Row-based Spark operators require JVM shuffle.
+
+3. **Supported partitioning type**: Native shuffle supports:
+ - `HashPartitioning`
+ - `RangePartitioning`
+ - `SinglePartition`
+
+ `RoundRobinPartitioning` requires JVM shuffle.
+
+4. **Supported partition key types**: For `HashPartitioning` and
`RangePartitioning`, partition
+ keys must be primitive types. Complex types (struct, array, map) as
partition keys require
+ JVM shuffle. Note that complex types are fully supported as data columns in
native shuffle.
+
+## Architecture
+
+```
+┌─────────────────────────────────────────────────────────────────────────────┐
+│ CometShuffleManager
│
+│ - Routes to CometNativeShuffleWriter for CometNativeShuffleHandle
│
+└─────────────────────────────────────────────────────────────────────────────┘
+ │
+ ▼
+┌─────────────────────────────────────────────────────────────────────────────┐
+│ CometNativeShuffleWriter
│
+│ - Constructs protobuf operator plan
│
+│ - Invokes native execution via CometExec.getCometIterator()
│
+└─────────────────────────────────────────────────────────────────────────────┘
+ │
+ ▼ (JNI)
+┌─────────────────────────────────────────────────────────────────────────────┐
+│ ShuffleWriterExec (Rust)
│
+│ - DataFusion ExecutionPlan
│
+│ - Orchestrates partitioning and writing
│
+└─────────────────────────────────────────────────────────────────────────────┘
+ │ │
+ ▼ ▼
+┌───────────────────────────────────┐ ┌───────────────────────────────────┐
+│ MultiPartitionShuffleRepartitioner │ │ SinglePartitionShufflePartitioner │
+│ (hash/range partitioning) │ │ (single partition case) │
+└───────────────────────────────────┘ └───────────────────────────────────┘
+ │
+ ▼
+┌───────────────────────────────────┐
+│ ShuffleBlockWriter │
+│ (Arrow IPC + compression) │
+└───────────────────────────────────┘
+ │
+ ▼
+ ┌─────────────────┐
+ │ Data + Index │
+ │ Files │
+ └─────────────────┘
+```
+
+## Key Classes
+
+### Scala Side
+
+| Class | Location
| Description
|
+| ------------------------------ |
------------------------------------------------ |
---------------------------------------------------------------------------------------------
|
+| `CometShuffleExchangeExec` |
`.../shuffle/CometShuffleExchangeExec.scala` | Physical plan node.
Validates types and partitioning, creates `CometShuffleDependency`. |
+| `CometNativeShuffleWriter` |
`.../shuffle/CometNativeShuffleWriter.scala` | Implements `ShuffleWriter`.
Builds protobuf plan and invokes native execution. |
+| `CometShuffleDependency` | `.../shuffle/CometShuffleDependency.scala`
| Extends `ShuffleDependency`. Holds shuffle type, schema, and range
partition bounds. |
+| `CometBlockStoreShuffleReader` |
`.../shuffle/CometBlockStoreShuffleReader.scala` | Reads shuffle blocks via
`ShuffleBlockFetcherIterator`. Decodes Arrow IPC to `ColumnarBatch`. |
+| `NativeBatchDecoderIterator` |
`.../shuffle/NativeBatchDecoderIterator.scala` | Reads compressed Arrow IPC
from input stream. Calls native decode via JNI. |
+
+### Rust Side
+
+| File | Location | Description
|
+| ----------------------- | ------------------------------------ |
------------------------------------------------------------------------------------
|
+| `shuffle_writer.rs` | `native/core/src/execution/shuffle/` |
`ShuffleWriterExec` plan and partitioners. Main shuffle logic.
|
+| `codec.rs` | `native/core/src/execution/shuffle/` |
`ShuffleBlockWriter` for Arrow IPC encoding with compression. Also handles
decoding. |
+| `comet_partitioning.rs` | `native/core/src/execution/shuffle/` |
`CometPartitioning` enum defining partition schemes (Hash, Range, Single).
|
+
+## Data Flow
+
+### Write Path
+
+1. **Plan construction**: `CometNativeShuffleWriter` builds a protobuf
operator plan containing:
+ - A scan operator reading from the input iterator
+ - A `ShuffleWriter` operator with partitioning config and compression codec
+
+2. **Native execution**: `CometExec.getCometIterator()` executes the plan in
Rust.
+
+3. **Partitioning**: `ShuffleWriterExec` receives batches and routes to the
appropriate partitioner:
+ - `MultiPartitionShuffleRepartitioner`: For hash/range partitioning
+ - `SinglePartitionShufflePartitioner`: For single partition (simpler path)
+
+4. **Buffering and spilling**: The partitioner buffers rows per partition.
When memory pressure
+ exceeds the threshold, partitions spill to temporary files.
+
+5. **Encoding**: `ShuffleBlockWriter` encodes each partition's data as
compressed Arrow IPC:
+ - Writes compression type header
+ - Writes field count header
+ - Writes compressed IPC stream
+
+6. **Output files**: Two files are produced:
+ - **Data file**: Concatenated partition data
+ - **Index file**: Array of 8-byte little-endian offsets marking partition
boundaries
+
+7. **Commit**: Back in JVM, `CometNativeShuffleWriter` reads the index file to
get partition
+ lengths and commits via Spark's `IndexShuffleBlockResolver`.
+
+### Read Path
+
+1. `CometBlockStoreShuffleReader` fetches shuffle blocks via
`ShuffleBlockFetcherIterator`.
+
+2. For each block, `NativeBatchDecoderIterator`:
+ - Reads the 8-byte compressed length header
+ - Reads the 8-byte field count header
+ - Reads the compressed IPC data
+ - Calls `Native.decodeShuffleBlock()` via JNI
+
+3. Native code decompresses and deserializes the Arrow IPC stream.
+
+4. Arrow FFI transfers the `RecordBatch` to JVM as a `ColumnarBatch`.
+
+## Partitioning
+
+### Hash Partitioning
+
+Native shuffle implements Spark-compatible hash partitioning:
+
+- Uses Murmur3 hash function with seed 42 (matching Spark)
+- Computes hash of partition key columns
+- Applies modulo by partition count: `partition_id = hash % num_partitions`
+
+### Range Partitioning
+
+For range partitioning:
+
+1. Spark's `RangePartitioner` samples data and computes partition boundaries
on the driver.
+2. Boundaries are serialized to the native plan.
+3. Native code converts sort key columns to comparable row format.
+4. Binary search (`partition_point`) determines which partition each row
belongs to.
+
+### Single Partition
+
+The simplest case: all rows go to partition 0. Uses
`SinglePartitionShufflePartitioner` which
+simply concatenates batches to reach the configured batch size.
+
+## Memory Management
+
+Native shuffle uses DataFusion's memory management with spilling support:
+
+- **Memory pool**: Tracks memory usage across the shuffle operation.
+- **Spill threshold**: When buffered data exceeds the threshold, partitions
spill to disk.
+- **Per-partition spilling**: Each partition has its own spill file. Multiple
spills for a
+ partition are concatenated when writing the final output.
+- **Scratch space**: Reusable buffers for partition ID computation to reduce
allocations.
+
+The `MultiPartitionShuffleRepartitioner` manages:
+
+- `PartitionBuffer`: In-memory buffer for each partition
+- `SpillFile`: Temporary file for spilled data
+- Memory tracking via `MemoryConsumer` trait
+
+## Compression
+
+Native shuffle supports multiple compression codecs configured via
+`spark.comet.exec.shuffle.compression.codec`:
+
+| Codec | Description |
+| -------- | ------------------------------------------------------ |
+| `zstd` | Zstandard compression. Best ratio, configurable level. |
+| `lz4` | LZ4 compression. Fast with good ratio. |
+| `snappy` | Snappy compression. Fastest, lower ratio. |
+| `none` | No compression. |
+
+The compression codec is applied uniformly to all partitions. Each partition's
data is
+independently compressed, allowing parallel decompression during reads.
+
+## Configuration
+
+| Config | Default | Description
|
+| ------------------------------------------------- | ------- |
---------------------------------------- |
+| `spark.comet.exec.shuffle.enabled` | `true` | Enable Comet
shuffle |
+| `spark.comet.exec.shuffle.mode` | `auto` | Shuffle mode:
`native`, `jvm`, or `auto` |
+| `spark.comet.exec.shuffle.compression.codec` | `zstd` | Compression
codec |
+| `spark.comet.exec.shuffle.compression.zstd.level` | `1` | Zstd
compression level |
+| `spark.comet.shuffle.write.buffer.size` | `1MB` | Write buffer
size |
+| `spark.comet.columnar.shuffle.batch.size` | `8192` | Target rows
per batch |
+
+## Comparison with JVM Shuffle
+
+| Aspect | Native Shuffle | JVM Shuffle
|
+| ------------------- | -------------------------------------- |
--------------------------------- |
+| Input format | Columnar (direct from Comet operators) | Row-based
(via ColumnarToRowExec) |
+| Partitioning logic | Rust implementation | Spark's
partitioner |
+| Supported schemes | Hash, Range, Single | Hash, Range,
Single, RoundRobin |
+| Partition key types | Primitives only | Any type
|
+| Performance | Higher (no format conversion) | Lower
(columnar→row→columnar) |
+| Writer variants | Single path | Bypass (hash)
and sort-based |
+
+See [JVM Shuffle](jvm_shuffle.md) for details on the JVM-based implementation.
diff --git a/contributor-guide/adding_a_new_expression.html
b/contributor-guide/adding_a_new_expression.html
index 2410952d1..e33056296 100644
--- a/contributor-guide/adding_a_new_expression.html
+++ b/contributor-guide/adding_a_new_expression.html
@@ -357,6 +357,8 @@ under the License.
<li class="toctree-l2"><a class="reference internal"
href="plugin_overview.html">Comet Plugin Architecture</a></li>
<li class="toctree-l2"><a class="reference internal"
href="plugin_overview.html#plugin-components">Plugin Components</a></li>
<li class="toctree-l2"><a class="reference internal" href="ffi.html">Arrow
FFI</a></li>
+<li class="toctree-l2"><a class="reference internal"
href="jvm_shuffle.html">JVM Shuffle</a></li>
+<li class="toctree-l2"><a class="reference internal"
href="native_shuffle.html">Native Shuffle</a></li>
<li class="toctree-l2"><a class="reference internal"
href="parquet_scans.html">Parquet Scans</a></li>
<li class="toctree-l2"><a class="reference internal"
href="development.html">Development Guide</a></li>
<li class="toctree-l2"><a class="reference internal"
href="debugging.html">Debugging Guide</a></li>
diff --git a/contributor-guide/adding_a_new_operator.html
b/contributor-guide/adding_a_new_operator.html
index 0a67ab9ce..9c6767840 100644
--- a/contributor-guide/adding_a_new_operator.html
+++ b/contributor-guide/adding_a_new_operator.html
@@ -357,6 +357,8 @@ under the License.
<li class="toctree-l2"><a class="reference internal"
href="plugin_overview.html">Comet Plugin Architecture</a></li>
<li class="toctree-l2"><a class="reference internal"
href="plugin_overview.html#plugin-components">Plugin Components</a></li>
<li class="toctree-l2"><a class="reference internal" href="ffi.html">Arrow
FFI</a></li>
+<li class="toctree-l2"><a class="reference internal"
href="jvm_shuffle.html">JVM Shuffle</a></li>
+<li class="toctree-l2"><a class="reference internal"
href="native_shuffle.html">Native Shuffle</a></li>
<li class="toctree-l2"><a class="reference internal"
href="parquet_scans.html">Parquet Scans</a></li>
<li class="toctree-l2"><a class="reference internal"
href="development.html">Development Guide</a></li>
<li class="toctree-l2"><a class="reference internal"
href="debugging.html">Debugging Guide</a></li>
diff --git a/contributor-guide/benchmarking.html
b/contributor-guide/benchmarking.html
index 19b24d983..4b2cdba59 100644
--- a/contributor-guide/benchmarking.html
+++ b/contributor-guide/benchmarking.html
@@ -357,6 +357,8 @@ under the License.
<li class="toctree-l2"><a class="reference internal"
href="plugin_overview.html">Comet Plugin Architecture</a></li>
<li class="toctree-l2"><a class="reference internal"
href="plugin_overview.html#plugin-components">Plugin Components</a></li>
<li class="toctree-l2"><a class="reference internal" href="ffi.html">Arrow
FFI</a></li>
+<li class="toctree-l2"><a class="reference internal"
href="jvm_shuffle.html">JVM Shuffle</a></li>
+<li class="toctree-l2"><a class="reference internal"
href="native_shuffle.html">Native Shuffle</a></li>
<li class="toctree-l2"><a class="reference internal"
href="parquet_scans.html">Parquet Scans</a></li>
<li class="toctree-l2"><a class="reference internal"
href="development.html">Development Guide</a></li>
<li class="toctree-l2"><a class="reference internal"
href="debugging.html">Debugging Guide</a></li>
diff --git a/contributor-guide/contributing.html
b/contributor-guide/contributing.html
index f84b530bb..0f607a0dd 100644
--- a/contributor-guide/contributing.html
+++ b/contributor-guide/contributing.html
@@ -357,6 +357,8 @@ under the License.
<li class="toctree-l2"><a class="reference internal"
href="plugin_overview.html">Comet Plugin Architecture</a></li>
<li class="toctree-l2"><a class="reference internal"
href="plugin_overview.html#plugin-components">Plugin Components</a></li>
<li class="toctree-l2"><a class="reference internal" href="ffi.html">Arrow
FFI</a></li>
+<li class="toctree-l2"><a class="reference internal"
href="jvm_shuffle.html">JVM Shuffle</a></li>
+<li class="toctree-l2"><a class="reference internal"
href="native_shuffle.html">Native Shuffle</a></li>
<li class="toctree-l2"><a class="reference internal"
href="parquet_scans.html">Parquet Scans</a></li>
<li class="toctree-l2"><a class="reference internal"
href="development.html">Development Guide</a></li>
<li class="toctree-l2"><a class="reference internal"
href="debugging.html">Debugging Guide</a></li>
diff --git a/contributor-guide/debugging.html b/contributor-guide/debugging.html
index 40474f633..fc93e92f3 100644
--- a/contributor-guide/debugging.html
+++ b/contributor-guide/debugging.html
@@ -357,6 +357,8 @@ under the License.
<li class="toctree-l2"><a class="reference internal"
href="plugin_overview.html">Comet Plugin Architecture</a></li>
<li class="toctree-l2"><a class="reference internal"
href="plugin_overview.html#plugin-components">Plugin Components</a></li>
<li class="toctree-l2"><a class="reference internal" href="ffi.html">Arrow
FFI</a></li>
+<li class="toctree-l2"><a class="reference internal"
href="jvm_shuffle.html">JVM Shuffle</a></li>
+<li class="toctree-l2"><a class="reference internal"
href="native_shuffle.html">Native Shuffle</a></li>
<li class="toctree-l2"><a class="reference internal"
href="parquet_scans.html">Parquet Scans</a></li>
<li class="toctree-l2"><a class="reference internal"
href="development.html">Development Guide</a></li>
<li class="toctree-l2 current"><a class="current reference internal"
href="#">Debugging Guide</a></li>
diff --git a/contributor-guide/development.html
b/contributor-guide/development.html
index e6ad21df1..b4a843fdc 100644
--- a/contributor-guide/development.html
+++ b/contributor-guide/development.html
@@ -357,6 +357,8 @@ under the License.
<li class="toctree-l2"><a class="reference internal"
href="plugin_overview.html">Comet Plugin Architecture</a></li>
<li class="toctree-l2"><a class="reference internal"
href="plugin_overview.html#plugin-components">Plugin Components</a></li>
<li class="toctree-l2"><a class="reference internal" href="ffi.html">Arrow
FFI</a></li>
+<li class="toctree-l2"><a class="reference internal"
href="jvm_shuffle.html">JVM Shuffle</a></li>
+<li class="toctree-l2"><a class="reference internal"
href="native_shuffle.html">Native Shuffle</a></li>
<li class="toctree-l2"><a class="reference internal"
href="parquet_scans.html">Parquet Scans</a></li>
<li class="toctree-l2 current"><a class="current reference internal"
href="#">Development Guide</a></li>
<li class="toctree-l2"><a class="reference internal"
href="debugging.html">Debugging Guide</a></li>
diff --git a/contributor-guide/ffi.html b/contributor-guide/ffi.html
index 9e4ba44fd..8d0201fd2 100644
--- a/contributor-guide/ffi.html
+++ b/contributor-guide/ffi.html
@@ -65,7 +65,7 @@ under the License.
<script async="true" defer="true"
src="https://buttons.github.io/buttons.js"></script>
<link rel="index" title="Index" href="../genindex.html" />
<link rel="search" title="Search" href="../search.html" />
- <link rel="next" title="Comet Parquet Scan Implementations"
href="parquet_scans.html" />
+ <link rel="next" title="JVM Shuffle" href="jvm_shuffle.html" />
<link rel="prev" title="Comet Plugin Architecture"
href="plugin_overview.html" />
<meta name="viewport" content="width=device-width, initial-scale=1"/>
<meta name="docsearch:language" content="en"/>
@@ -357,6 +357,8 @@ under the License.
<li class="toctree-l2"><a class="reference internal"
href="plugin_overview.html">Comet Plugin Architecture</a></li>
<li class="toctree-l2"><a class="reference internal"
href="plugin_overview.html#plugin-components">Plugin Components</a></li>
<li class="toctree-l2 current"><a class="current reference internal"
href="#">Arrow FFI</a></li>
+<li class="toctree-l2"><a class="reference internal"
href="jvm_shuffle.html">JVM Shuffle</a></li>
+<li class="toctree-l2"><a class="reference internal"
href="native_shuffle.html">Native Shuffle</a></li>
<li class="toctree-l2"><a class="reference internal"
href="parquet_scans.html">Parquet Scans</a></li>
<li class="toctree-l2"><a class="reference internal"
href="development.html">Development Guide</a></li>
<li class="toctree-l2"><a class="reference internal"
href="debugging.html">Debugging Guide</a></li>
@@ -834,11 +836,11 @@ t4 Batch handle released ArrowBuf freed
Data freed
</div>
</a>
<a class="right-next"
- href="parquet_scans.html"
+ href="jvm_shuffle.html"
title="next page">
<div class="prev-next-info">
<p class="prev-next-subtitle">next</p>
- <p class="prev-next-title">Comet Parquet Scan Implementations</p>
+ <p class="prev-next-title">JVM Shuffle</p>
</div>
<i class="fa-solid fa-angle-right"></i>
</a>
diff --git a/contributor-guide/index.html b/contributor-guide/index.html
index f49639ace..e5c7bcc0f 100644
--- a/contributor-guide/index.html
+++ b/contributor-guide/index.html
@@ -357,6 +357,8 @@ under the License.
<li class="toctree-l2"><a class="reference internal"
href="plugin_overview.html">Comet Plugin Architecture</a></li>
<li class="toctree-l2"><a class="reference internal"
href="plugin_overview.html#plugin-components">Plugin Components</a></li>
<li class="toctree-l2"><a class="reference internal" href="ffi.html">Arrow
FFI</a></li>
+<li class="toctree-l2"><a class="reference internal"
href="jvm_shuffle.html">JVM Shuffle</a></li>
+<li class="toctree-l2"><a class="reference internal"
href="native_shuffle.html">Native Shuffle</a></li>
<li class="toctree-l2"><a class="reference internal"
href="parquet_scans.html">Parquet Scans</a></li>
<li class="toctree-l2"><a class="reference internal"
href="development.html">Development Guide</a></li>
<li class="toctree-l2"><a class="reference internal"
href="debugging.html">Debugging Guide</a></li>
@@ -480,6 +482,30 @@ under the License.
<li class="toctree-l2"><a class="reference internal"
href="ffi.html#further-reading">Further Reading</a></li>
</ul>
</li>
+<li class="toctree-l1"><a class="reference internal"
href="jvm_shuffle.html">JVM Shuffle</a><ul>
+<li class="toctree-l2"><a class="reference internal"
href="jvm_shuffle.html#overview">Overview</a></li>
+<li class="toctree-l2"><a class="reference internal"
href="jvm_shuffle.html#when-jvm-shuffle-is-used">When JVM Shuffle is
Used</a></li>
+<li class="toctree-l2"><a class="reference internal"
href="jvm_shuffle.html#input-handling">Input Handling</a></li>
+<li class="toctree-l2"><a class="reference internal"
href="jvm_shuffle.html#architecture">Architecture</a></li>
+<li class="toctree-l2"><a class="reference internal"
href="jvm_shuffle.html#key-classes">Key Classes</a></li>
+<li class="toctree-l2"><a class="reference internal"
href="jvm_shuffle.html#data-flow">Data Flow</a></li>
+<li class="toctree-l2"><a class="reference internal"
href="jvm_shuffle.html#memory-management">Memory Management</a></li>
+<li class="toctree-l2"><a class="reference internal"
href="jvm_shuffle.html#configuration">Configuration</a></li>
+</ul>
+</li>
+<li class="toctree-l1"><a class="reference internal"
href="native_shuffle.html">Native Shuffle</a><ul>
+<li class="toctree-l2"><a class="reference internal"
href="native_shuffle.html#overview">Overview</a></li>
+<li class="toctree-l2"><a class="reference internal"
href="native_shuffle.html#when-native-shuffle-is-used">When Native Shuffle is
Used</a></li>
+<li class="toctree-l2"><a class="reference internal"
href="native_shuffle.html#architecture">Architecture</a></li>
+<li class="toctree-l2"><a class="reference internal"
href="native_shuffle.html#key-classes">Key Classes</a></li>
+<li class="toctree-l2"><a class="reference internal"
href="native_shuffle.html#data-flow">Data Flow</a></li>
+<li class="toctree-l2"><a class="reference internal"
href="native_shuffle.html#partitioning">Partitioning</a></li>
+<li class="toctree-l2"><a class="reference internal"
href="native_shuffle.html#memory-management">Memory Management</a></li>
+<li class="toctree-l2"><a class="reference internal"
href="native_shuffle.html#compression">Compression</a></li>
+<li class="toctree-l2"><a class="reference internal"
href="native_shuffle.html#configuration">Configuration</a></li>
+<li class="toctree-l2"><a class="reference internal"
href="native_shuffle.html#comparison-with-jvm-shuffle">Comparison with JVM
Shuffle</a></li>
+</ul>
+</li>
<li class="toctree-l1"><a class="reference internal"
href="parquet_scans.html">Parquet Scans</a><ul>
<li class="toctree-l2"><a class="reference internal"
href="parquet_scans.html#s3-support">S3 Support</a></li>
</ul>
diff --git a/contributor-guide/jvm_shuffle.html
b/contributor-guide/jvm_shuffle.html
new file mode 100644
index 000000000..5fca9e972
--- /dev/null
+++ b/contributor-guide/jvm_shuffle.html
@@ -0,0 +1,788 @@
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+
+<!DOCTYPE html>
+
+
+<html lang="en" data-content_root="../" data-theme="light">
+
+ <head>
+ <meta charset="utf-8" />
+ <meta name="viewport" content="width=device-width, initial-scale=1.0"
/><meta name="viewport" content="width=device-width, initial-scale=1" />
+
+ <title>JVM Shuffle — Apache DataFusion Comet documentation</title>
+
+
+
+ <script data-cfasync="false">
+ document.documentElement.dataset.mode = localStorage.getItem("mode") ||
"light";
+ document.documentElement.dataset.theme = localStorage.getItem("theme") ||
"light";
+ </script>
+ <!--
+ this give us a css class that will be invisible only if js is disabled
+ -->
+ <noscript>
+ <style>
+ .pst-js-only { display: none !important; }
+
+ </style>
+ </noscript>
+
+ <!-- Loaded before other Sphinx assets -->
+ <link href="../_static/styles/theme.css?digest=8878045cc6db502f8baf"
rel="stylesheet" />
+<link
href="../_static/styles/pydata-sphinx-theme.css?digest=8878045cc6db502f8baf"
rel="stylesheet" />
+
+ <link rel="stylesheet" type="text/css"
href="../_static/pygments.css?v=8f2a1f02" />
+ <link rel="stylesheet" type="text/css"
href="../_static/theme_overrides.css?v=cd442bcd" />
+
+ <!-- So that users can add custom icons -->
+ <script
src="../_static/scripts/fontawesome.js?digest=8878045cc6db502f8baf"></script>
+ <!-- Pre-loaded scripts that we'll load fully later -->
+ <link rel="preload" as="script"
href="../_static/scripts/bootstrap.js?digest=8878045cc6db502f8baf" />
+<link rel="preload" as="script"
href="../_static/scripts/pydata-sphinx-theme.js?digest=8878045cc6db502f8baf" />
+
+ <script src="../_static/documentation_options.js?v=5929fcd5"></script>
+ <script src="../_static/doctools.js?v=9a2dae69"></script>
+ <script src="../_static/sphinx_highlight.js?v=dc90522c"></script>
+ <script>DOCUMENTATION_OPTIONS.pagename =
'contributor-guide/jvm_shuffle';</script>
+ <script async="true" defer="true"
src="https://buttons.github.io/buttons.js"></script>
+ <link rel="index" title="Index" href="../genindex.html" />
+ <link rel="search" title="Search" href="../search.html" />
+ <link rel="next" title="Native Shuffle" href="native_shuffle.html" />
+ <link rel="prev" title="Arrow FFI Usage in Comet" href="ffi.html" />
+ <meta name="viewport" content="width=device-width, initial-scale=1"/>
+ <meta name="docsearch:language" content="en"/>
+ <meta name="docsearch:version" content="" />
+ </head>
+
+
+ <body data-bs-spy="scroll" data-bs-target=".bd-toc-nav" data-offset="180"
data-bs-root-margin="0px 0px -60%" data-default-mode="light">
+
+
+
+ <div id="pst-skip-link" class="skip-link d-print-none"><a
href="#main-content">Skip to main content</a></div>
+
+ <div id="pst-scroll-pixel-helper"></div>
+
+ <button type="button" class="btn rounded-pill" id="pst-back-to-top">
+ <i class="fa-solid fa-arrow-up"></i>Back to top</button>
+
+
+ <dialog id="pst-search-dialog">
+
+<form class="bd-search d-flex align-items-center"
+ action="../search.html"
+ method="get">
+ <i class="fa-solid fa-magnifying-glass"></i>
+ <input type="search"
+ class="form-control"
+ name="q"
+ placeholder="Search the docs ..."
+ aria-label="Search the docs ..."
+ autocomplete="off"
+ autocorrect="off"
+ autocapitalize="off"
+ spellcheck="false"/>
+ <span class="search-button__kbd-shortcut"><kbd
class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd>K</kbd></span>
+</form>
+ </dialog>
+
+ <div class="pst-async-banner-revealer d-none">
+ <aside id="bd-header-version-warning" class="d-none d-print-none"
aria-label="Version warning"></aside>
+</div>
+
+
+ <header class="bd-header navbar navbar-expand-lg bd-navbar d-print-none">
+<div class="bd-header__inner bd-page-width">
+ <button class="pst-navbar-icon sidebar-toggle primary-toggle"
aria-label="Site navigation">
+ <span class="fa-solid fa-bars"></span>
+ </button>
+
+
+ <div class="col-lg-3 navbar-header-items__start">
+
+ <div class="navbar-item">
+
+
+
+
+
+<a class="navbar-brand logo" href="../index.html">
+
+
+
+
+
+
+
+
+ <img src="../_static/DataFusionComet-Logo-Light.png" class="logo__image
only-light" alt="Apache DataFusion Comet documentation - Home"/>
+ <img src="../_static/DataFusionComet-Logo-Dark.png" class="logo__image
only-dark pst-js-only" alt="Apache DataFusion Comet documentation - Home"/>
+
+
+</a></div>
+
+ </div>
+
+ <div class="col-lg-9 navbar-header-items">
+
+ <div class="me-auto navbar-header-items__center">
+
+ <div class="navbar-item">
+<nav>
+ <ul class="bd-navbar-elements navbar-nav">
+
+<li class="nav-item ">
+ <a class="nav-link nav-internal" href="../about/index.html">
+ Comet Overview
+ </a>
+</li>
+
+
+<li class="nav-item ">
+ <a class="nav-link nav-internal" href="../user-guide/index.html">
+ User Guide
+ </a>
+</li>
+
+
+<li class="nav-item current active">
+ <a class="nav-link nav-internal" href="index.html">
+ Contributor Guide
+ </a>
+</li>
+
+
+<li class="nav-item ">
+ <a class="nav-link nav-internal" href="../asf/index.html">
+ ASF Links
+ </a>
+</li>
+
+ </ul>
+</nav></div>
+
+ </div>
+
+
+ <div class="navbar-header-items__end">
+
+ <div class="navbar-item navbar-persistent--container">
+
+
+<button class="btn search-button-field search-button__button pst-js-only"
title="Search" aria-label="Search" data-bs-placement="bottom"
data-bs-toggle="tooltip">
+ <i class="fa-solid fa-magnifying-glass"></i>
+ <span class="search-button__default-text">Search</span>
+ <span class="search-button__kbd-shortcut"><kbd
class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd
class="kbd-shortcut__modifier">K</kbd></span>
+</button>
+ </div>
+
+
+ <div class="navbar-item"><ul class="navbar-icon-links"
+ aria-label="Icon Links">
+ <li class="nav-item">
+
+
+
+
+
+
+
+
+ <a href="https://github.com/apache/datafusion-comet" title="GitHub"
class="nav-link pst-navbar-icon" rel="noopener" target="_blank"
data-bs-toggle="tooltip" data-bs-placement="bottom"><i class="fa-brands
fa-github fa-lg" aria-hidden="true"></i>
+ <span class="sr-only">GitHub</span></a>
+ </li>
+</ul></div>
+
+ <div class="navbar-item">
+
+<button class="btn btn-sm nav-link pst-navbar-icon theme-switch-button
pst-js-only" aria-label="Color mode" data-bs-title="Color mode"
data-bs-placement="bottom" data-bs-toggle="tooltip">
+ <i class="theme-switch fa-solid fa-sun fa-lg"
data-mode="light" title="Light"></i>
+ <i class="theme-switch fa-solid fa-moon fa-lg"
data-mode="dark" title="Dark"></i>
+ <i class="theme-switch fa-solid fa-circle-half-stroke fa-lg"
data-mode="auto" title="System Settings"></i>
+</button></div>
+
+ </div>
+
+ </div>
+
+
+ <div class="navbar-persistent--mobile">
+
+<button class="btn search-button-field search-button__button pst-js-only"
title="Search" aria-label="Search" data-bs-placement="bottom"
data-bs-toggle="tooltip">
+ <i class="fa-solid fa-magnifying-glass"></i>
+ <span class="search-button__default-text">Search</span>
+ <span class="search-button__kbd-shortcut"><kbd
class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd
class="kbd-shortcut__modifier">K</kbd></span>
+</button>
+ </div>
+
+
+
+</div>
+
+ </header>
+
+
+ <div class="bd-container">
+ <div class="bd-container__inner bd-page-width">
+
+
+
+ <dialog id="pst-primary-sidebar-modal"></dialog>
+ <div id="pst-primary-sidebar" class="bd-sidebar-primary bd-sidebar">
+
+
+
+ <div class="sidebar-header-items sidebar-primary__section">
+
+
+ <div class="sidebar-header-items__center">
+
+
+
+ <div class="navbar-item">
+<nav>
+ <ul class="bd-navbar-elements navbar-nav">
+
+<li class="nav-item ">
+ <a class="nav-link nav-internal" href="../about/index.html">
+ Comet Overview
+ </a>
+</li>
+
+
+<li class="nav-item ">
+ <a class="nav-link nav-internal" href="../user-guide/index.html">
+ User Guide
+ </a>
+</li>
+
+
+<li class="nav-item current active">
+ <a class="nav-link nav-internal" href="index.html">
+ Contributor Guide
+ </a>
+</li>
+
+
+<li class="nav-item ">
+ <a class="nav-link nav-internal" href="../asf/index.html">
+ ASF Links
+ </a>
+</li>
+
+ </ul>
+</nav></div>
+
+
+ </div>
+
+
+
+ <div class="sidebar-header-items__end">
+
+ <div class="navbar-item"><ul class="navbar-icon-links"
+ aria-label="Icon Links">
+ <li class="nav-item">
+
+
+
+
+
+
+
+
+ <a href="https://github.com/apache/datafusion-comet" title="GitHub"
class="nav-link pst-navbar-icon" rel="noopener" target="_blank"
data-bs-toggle="tooltip" data-bs-placement="bottom"><i class="fa-brands
fa-github fa-lg" aria-hidden="true"></i>
+ <span class="sr-only">GitHub</span></a>
+ </li>
+</ul></div>
+
+ <div class="navbar-item">
+
+<button class="btn btn-sm nav-link pst-navbar-icon theme-switch-button
pst-js-only" aria-label="Color mode" data-bs-title="Color mode"
data-bs-placement="bottom" data-bs-toggle="tooltip">
+ <i class="theme-switch fa-solid fa-sun fa-lg"
data-mode="light" title="Light"></i>
+ <i class="theme-switch fa-solid fa-moon fa-lg"
data-mode="dark" title="Dark"></i>
+ <i class="theme-switch fa-solid fa-circle-half-stroke fa-lg"
data-mode="auto" title="System Settings"></i>
+</button></div>
+
+ </div>
+
+ </div>
+
+ <div class="sidebar-primary-items__start sidebar-primary__section">
+ <div class="sidebar-primary-item"><!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+<nav class="bd-links" id="bd-docs-nav" aria-label="Main navigation">
+ <div class="bd-toc-item active">
+ <p aria-level="2" class="caption" role="heading"><span
class="caption-text">Index</span></p>
+<ul class="current">
+<li class="toctree-l1"><a class="reference internal"
href="../about/index.html">Comet Overview</a></li>
+<li class="toctree-l1"><a class="reference internal"
href="../user-guide/index.html">User Guide</a></li>
+<li class="toctree-l1 current"><a class="reference internal"
href="index.html">Contributor Guide</a><ul class="current">
+<li class="toctree-l2"><a class="reference internal"
href="contributing.html">Getting Started</a></li>
+<li class="toctree-l2"><a class="reference internal"
href="plugin_overview.html">Comet Plugin Architecture</a></li>
+<li class="toctree-l2"><a class="reference internal"
href="plugin_overview.html#plugin-components">Plugin Components</a></li>
+<li class="toctree-l2"><a class="reference internal" href="ffi.html">Arrow
FFI</a></li>
+<li class="toctree-l2 current"><a class="current reference internal"
href="#">JVM Shuffle</a></li>
+<li class="toctree-l2"><a class="reference internal"
href="native_shuffle.html">Native Shuffle</a></li>
+<li class="toctree-l2"><a class="reference internal"
href="parquet_scans.html">Parquet Scans</a></li>
+<li class="toctree-l2"><a class="reference internal"
href="development.html">Development Guide</a></li>
+<li class="toctree-l2"><a class="reference internal"
href="debugging.html">Debugging Guide</a></li>
+<li class="toctree-l2"><a class="reference internal"
href="benchmarking.html">Benchmarking Guide</a></li>
+<li class="toctree-l2"><a class="reference internal"
href="adding_a_new_operator.html">Adding a New Operator</a></li>
+<li class="toctree-l2"><a class="reference internal"
href="adding_a_new_expression.html">Adding a New Expression</a></li>
+<li class="toctree-l2"><a class="reference internal"
href="tracing.html">Tracing</a></li>
+<li class="toctree-l2"><a class="reference internal"
href="profiling_native_code.html">Profiling Native Code</a></li>
+<li class="toctree-l2"><a class="reference internal"
href="spark-sql-tests.html">Spark SQL Tests</a></li>
+<li class="toctree-l2"><a class="reference internal"
href="roadmap.html">Roadmap</a></li>
+<li class="toctree-l2"><a class="reference external"
href="https://github.com/apache/datafusion-comet">Github and Issue
Tracker</a></li>
+</ul>
+</li>
+<li class="toctree-l1"><a class="reference internal"
href="../asf/index.html">ASF Links</a></li>
+</ul>
+
+ </div>
+</nav>
+</div>
+ </div>
+
+
+ <div class="sidebar-primary-items__end sidebar-primary__section">
+ <div class="sidebar-primary-item">
+<div id="ethical-ad-placement"
+ class="flat"
+ data-ea-publisher="readthedocs"
+ data-ea-type="readthedocs-sidebar"
+ data-ea-manual="true">
+</div></div>
+ </div>
+
+
+ </div>
+
+ <main id="main-content" class="bd-main" role="main">
+
+
+ <div class="bd-content">
+ <div class="bd-article-container">
+
+ <div class="bd-header-article d-print-none">
+<div class="header-article-items header-article__inner">
+
+ <div class="header-article-items__start">
+
+ <div class="header-article-item">
+
+<nav aria-label="Breadcrumb" class="d-print-none">
+ <ul class="bd-breadcrumbs">
+
+ <li class="breadcrumb-item breadcrumb-home">
+ <a href="../index.html" class="nav-link" aria-label="Home">
+ <i class="fa-solid fa-home"></i>
+ </a>
+ </li>
+
+ <li class="breadcrumb-item"><a href="index.html" class="nav-link">Comet
Contributor Guide</a></li>
+
+ <li class="breadcrumb-item active" aria-current="page"><span
class="ellipsis">JVM Shuffle</span></li>
+ </ul>
+</nav>
+</div>
+
+ </div>
+
+
+</div>
+</div>
+
+
+
+
+<div id="searchbox"></div>
+ <article class="bd-article">
+
+ <!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+<section id="jvm-shuffle">
+<h1>JVM Shuffle<a class="headerlink" href="#jvm-shuffle" title="Link to this
heading">#</a></h1>
+<p>This document describes Comet’s JVM-based columnar shuffle implementation
(<code class="docutils literal notranslate"><span
class="pre">CometColumnarShuffle</span></code>), which
+writes shuffle data in Arrow IPC format using JVM code with native encoding.
For the fully native
+alternative, see <a class="reference internal"
href="native_shuffle.html"><span class="std std-doc">Native
Shuffle</span></a>.</p>
+<section id="overview">
+<h2>Overview<a class="headerlink" href="#overview" title="Link to this
heading">#</a></h2>
+<p>Comet provides two shuffle implementations:</p>
+<ul class="simple">
+<li><p><strong>CometNativeShuffle</strong> (<code class="docutils literal
notranslate"><span class="pre">CometExchange</span></code>): Fully native
shuffle using Rust. Takes columnar input directly
+from Comet native operators and performs partitioning in native code.</p></li>
+<li><p><strong>CometColumnarShuffle</strong> (<code class="docutils literal
notranslate"><span class="pre">CometColumnarExchange</span></code>): JVM-based
shuffle that operates on rows internally,
+buffers <code class="docutils literal notranslate"><span
class="pre">UnsafeRow</span></code>s in memory pages, and uses native code (via
JNI) to encode them to Arrow IPC format.
+Uses Spark’s partitioner for partition assignment. Can accept either row-based
or columnar input
+(columnar input is converted to rows via <code class="docutils literal
notranslate"><span class="pre">ColumnarToRowExec</span></code>).</p></li>
+</ul>
+<p>The JVM shuffle is selected via <code class="docutils literal
notranslate"><span
class="pre">CometShuffleDependency.shuffleType</span></code>.</p>
+</section>
+<section id="when-jvm-shuffle-is-used">
+<h2>When JVM Shuffle is Used<a class="headerlink"
href="#when-jvm-shuffle-is-used" title="Link to this heading">#</a></h2>
+<p>JVM shuffle (<code class="docutils literal notranslate"><span
class="pre">CometColumnarExchange</span></code>) is used instead of native
shuffle (<code class="docutils literal notranslate"><span
class="pre">CometExchange</span></code>) in the following cases:</p>
+<ol class="arabic simple">
+<li><p><strong>Shuffle mode is explicitly set to “jvm”</strong>: When <code
class="docutils literal notranslate"><span
class="pre">spark.comet.exec.shuffle.mode</span></code> is set to <code
class="docutils literal notranslate"><span
class="pre">jvm</span></code>.</p></li>
+<li><p><strong>Child plan is not a Comet native operator</strong>: When the
child plan is a Spark row-based operator
+(not a <code class="docutils literal notranslate"><span
class="pre">CometPlan</span></code>), JVM shuffle is the only option since
native shuffle requires columnar input
+from Comet operators.</p></li>
+<li><p><strong>Unsupported partitioning type</strong>: Native shuffle only
supports <code class="docutils literal notranslate"><span
class="pre">HashPartitioning</span></code>, <code class="docutils literal
notranslate"><span class="pre">RangePartitioning</span></code>,
+and <code class="docutils literal notranslate"><span
class="pre">SinglePartition</span></code>. JVM shuffle additionally supports
<code class="docutils literal notranslate"><span
class="pre">RoundRobinPartitioning</span></code>.</p></li>
+<li><p><strong>Unsupported partition key types</strong>: For <code
class="docutils literal notranslate"><span
class="pre">HashPartitioning</span></code> and <code class="docutils literal
notranslate"><span class="pre">RangePartitioning</span></code>, native shuffle
+only supports primitive types as partition keys. Complex types (struct, array,
map) cannot be used
+as partition keys in native shuffle, though they are fully supported as data
columns in both implementations.</p></li>
+</ol>
+</section>
+<section id="input-handling">
+<h2>Input Handling<a class="headerlink" href="#input-handling" title="Link to
this heading">#</a></h2>
+<section id="spark-row-based-input">
+<h3>Spark Row-Based Input<a class="headerlink" href="#spark-row-based-input"
title="Link to this heading">#</a></h3>
+<p>When the child plan is a Spark row-based operator, <code class="docutils
literal notranslate"><span class="pre">CometColumnarExchange</span></code>
calls <code class="docutils literal notranslate"><span
class="pre">child.execute()</span></code> which
+returns an <code class="docutils literal notranslate"><span
class="pre">RDD[InternalRow]</span></code>. The rows flow directly to the JVM
shuffle writers.</p>
+</section>
+<section id="comet-columnar-input">
+<h3>Comet Columnar Input<a class="headerlink" href="#comet-columnar-input"
title="Link to this heading">#</a></h3>
+<p>When the child plan is a Comet native operator (e.g., <code class="docutils
literal notranslate"><span class="pre">CometHashAggregate</span></code>) but
JVM shuffle is selected
+(due to shuffle mode setting or unsupported partitioning), <code
class="docutils literal notranslate"><span
class="pre">CometColumnarExchange</span></code> still calls
+<code class="docutils literal notranslate"><span
class="pre">child.execute()</span></code>. Comet operators implement <code
class="docutils literal notranslate"><span
class="pre">doExecute()</span></code> by wrapping themselves with <code
class="docutils literal notranslate"><span
class="pre">ColumnarToRowExec</span></code>:</p>
+<div class="highlight-scala notranslate"><div
class="highlight"><pre><span></span><span class="c1">// In CometExec base
class</span>
+<span class="k">override</span><span class="w"> </span><span
class="k">def</span><span class="w"> </span><span
class="nf">doExecute</span><span class="p">():</span><span class="w">
</span><span class="nc">RDD</span><span class="p">[</span><span
class="nc">InternalRow</span><span class="p">]</span><span class="w">
</span><span class="o">=</span>
+<span class="w"> </span><span class="nc">ColumnarToRowExec</span><span
class="p">(</span><span class="bp">this</span><span class="p">).</span><span
class="n">doExecute</span><span class="p">()</span>
+</pre></div>
+</div>
+<p>This means the data path becomes:</p>
+<div class="highlight-default notranslate"><div
class="highlight"><pre><span></span>Comet Native (columnar) → ColumnarToRowExec
→ rows → JVM Shuffle → Arrow IPC → columnar
+</pre></div>
+</div>
+<p>This is less efficient than native shuffle which avoids the columnar-to-row
conversion:</p>
+<div class="highlight-default notranslate"><div
class="highlight"><pre><span></span>Comet Native (columnar) → Native Shuffle →
Arrow IPC → columnar
+</pre></div>
+</div>
+</section>
+<section id="why-use-spark-s-partitioner">
+<h3>Why Use Spark’s Partitioner?<a class="headerlink"
href="#why-use-spark-s-partitioner" title="Link to this heading">#</a></h3>
+<p>JVM shuffle uses row-based input so it can leverage Spark’s existing
partitioner infrastructure
+(<code class="docutils literal notranslate"><span
class="pre">partitioner.getPartition(key)</span></code>). This allows Comet to
support all of Spark’s partitioning schemes
+without reimplementing them in Rust. Native shuffle, by contrast, serializes
the partitioning scheme
+to protobuf and implements the partitioning logic in native code.</p>
+</section>
+</section>
+<section id="architecture">
+<h2>Architecture<a class="headerlink" href="#architecture" title="Link to this
heading">#</a></h2>
+<div class="highlight-default notranslate"><div
class="highlight"><pre><span></span>┌─────────────────────────────────────────────────────────────────────────┐
+│ CometShuffleManager │
+│ - Extends Spark's ShuffleManager │
+│ - Routes to appropriate writer/reader based on ShuffleHandle type │
+└─────────────────────────────────────────────────────────────────────────┘
+ │
+ ┌────────────────────────┼────────────────────────┐
+ ▼ ▼ ▼
+┌─────────────────────┐ ┌─────────────────────┐ ┌─────────────────────┐
+│ CometBypassMerge- │ │ CometUnsafe- │ │ CometNative- │
+│ SortShuffleWriter │ │ ShuffleWriter │ │ ShuffleWriter │
+│ (hash-based) │ │ (sort-based) │ │ (fully native) │
+└─────────────────────┘ └─────────────────────┘ └─────────────────────┘
+ │ │
+ ▼ ▼
+┌─────────────────────┐ ┌─────────────────────┐
+│ CometDiskBlock- │ │ CometShuffleExternal│
+│ Writer │ │ Sorter │
+└─────────────────────┘ └─────────────────────┘
+ │ │
+ └────────────┬───────────┘
+ ▼
+ ┌─────────────────────┐
+ │ SpillWriter │
+ │ (native encoding │
+ │ via JNI) │
+ └─────────────────────┘
+</pre></div>
+</div>
+</section>
+<section id="key-classes">
+<h2>Key Classes<a class="headerlink" href="#key-classes" title="Link to this
heading">#</a></h2>
+<section id="shuffle-manager">
+<h3>Shuffle Manager<a class="headerlink" href="#shuffle-manager" title="Link
to this heading">#</a></h3>
+<div class="pst-scrollable-table-container"><table class="table">
+<thead>
+<tr class="row-odd"><th class="head"><p>Class</p></th>
+<th class="head"><p>Location</p></th>
+<th class="head"><p>Description</p></th>
+</tr>
+</thead>
+<tbody>
+<tr class="row-even"><td><p><code class="docutils literal notranslate"><span
class="pre">CometShuffleManager</span></code></p></td>
+<td><p><code class="docutils literal notranslate"><span
class="pre">.../shuffle/CometShuffleManager.scala</span></code></p></td>
+<td><p>Entry point. Extends Spark’s <code class="docutils literal
notranslate"><span class="pre">ShuffleManager</span></code>. Selects
writer/reader based on handle type. Delegates non-Comet shuffles to <code
class="docutils literal notranslate"><span
class="pre">SortShuffleManager</span></code>.</p></td>
+</tr>
+<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span
class="pre">CometShuffleDependency</span></code></p></td>
+<td><p><code class="docutils literal notranslate"><span
class="pre">.../shuffle/CometShuffleDependency.scala</span></code></p></td>
+<td><p>Extends <code class="docutils literal notranslate"><span
class="pre">ShuffleDependency</span></code>. Contains <code class="docutils
literal notranslate"><span class="pre">shuffleType</span></code> (<code
class="docutils literal notranslate"><span
class="pre">CometColumnarShuffle</span></code> or <code class="docutils literal
notranslate"><span class="pre">CometNativeShuffle</span></code>) and schema
info.</p></td>
+</tr>
+</tbody>
+</table>
+</div>
+</section>
+<section id="shuffle-handles">
+<h3>Shuffle Handles<a class="headerlink" href="#shuffle-handles" title="Link
to this heading">#</a></h3>
+<div class="pst-scrollable-table-container"><table class="table">
+<thead>
+<tr class="row-odd"><th class="head"><p>Handle</p></th>
+<th class="head"><p>Writer Strategy</p></th>
+</tr>
+</thead>
+<tbody>
+<tr class="row-even"><td><p><code class="docutils literal notranslate"><span
class="pre">CometBypassMergeSortShuffleHandle</span></code></p></td>
+<td><p>Hash-based: one file per partition, merged at end</p></td>
+</tr>
+<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span
class="pre">CometSerializedShuffleHandle</span></code></p></td>
+<td><p>Sort-based: records sorted by partition ID, single output</p></td>
+</tr>
+<tr class="row-even"><td><p><code class="docutils literal notranslate"><span
class="pre">CometNativeShuffleHandle</span></code></p></td>
+<td><p>Fully native shuffle</p></td>
+</tr>
+</tbody>
+</table>
+</div>
+<p>Selection logic in <code class="docutils literal notranslate"><span
class="pre">CometShuffleManager.shouldBypassMergeSort()</span></code>:</p>
+<ul class="simple">
+<li><p>Uses bypass if partitions < threshold AND partitions × cores ≤ max
threads</p></li>
+<li><p>Otherwise uses sort-based to avoid OOM from many concurrent
writers</p></li>
+</ul>
+</section>
+<section id="writers">
+<h3>Writers<a class="headerlink" href="#writers" title="Link to this
heading">#</a></h3>
+<div class="pst-scrollable-table-container"><table class="table">
+<thead>
+<tr class="row-odd"><th class="head"><p>Class</p></th>
+<th class="head"><p>Location</p></th>
+<th class="head"><p>Description</p></th>
+</tr>
+</thead>
+<tbody>
+<tr class="row-even"><td><p><code class="docutils literal notranslate"><span
class="pre">CometBypassMergeSortShuffleWriter</span></code></p></td>
+<td><p><code class="docutils literal notranslate"><span
class="pre">.../shuffle/CometBypassMergeSortShuffleWriter.java</span></code></p></td>
+<td><p>Hash-based writer. Creates one <code class="docutils literal
notranslate"><span class="pre">CometDiskBlockWriter</span></code> per
partition. Supports async writes.</p></td>
+</tr>
+<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span
class="pre">CometUnsafeShuffleWriter</span></code></p></td>
+<td><p><code class="docutils literal notranslate"><span
class="pre">.../shuffle/CometUnsafeShuffleWriter.java</span></code></p></td>
+<td><p>Sort-based writer. Uses <code class="docutils literal
notranslate"><span class="pre">CometShuffleExternalSorter</span></code> to
buffer and sort records, then merges spill files.</p></td>
+</tr>
+<tr class="row-even"><td><p><code class="docutils literal notranslate"><span
class="pre">CometDiskBlockWriter</span></code></p></td>
+<td><p><code class="docutils literal notranslate"><span
class="pre">.../shuffle/CometDiskBlockWriter.java</span></code></p></td>
+<td><p>Buffers rows in memory pages for a single partition. Spills to disk via
native encoding. Used by bypass writer.</p></td>
+</tr>
+<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span
class="pre">CometShuffleExternalSorter</span></code></p></td>
+<td><p><code class="docutils literal notranslate"><span
class="pre">.../shuffle/sort/CometShuffleExternalSorter.java</span></code></p></td>
+<td><p>Buffers records across all partitions, sorts by partition ID, spills
sorted data. Used by unsafe writer.</p></td>
+</tr>
+<tr class="row-even"><td><p><code class="docutils literal notranslate"><span
class="pre">SpillWriter</span></code></p></td>
+<td><p><code class="docutils literal notranslate"><span
class="pre">.../shuffle/SpillWriter.java</span></code></p></td>
+<td><p>Base class for spill logic. Manages memory pages and calls <code
class="docutils literal notranslate"><span
class="pre">Native.writeSortedFileNative()</span></code> for Arrow IPC
encoding.</p></td>
+</tr>
+</tbody>
+</table>
+</div>
+</section>
+<section id="reader">
+<h3>Reader<a class="headerlink" href="#reader" title="Link to this
heading">#</a></h3>
+<div class="pst-scrollable-table-container"><table class="table">
+<thead>
+<tr class="row-odd"><th class="head"><p>Class</p></th>
+<th class="head"><p>Location</p></th>
+<th class="head"><p>Description</p></th>
+</tr>
+</thead>
+<tbody>
+<tr class="row-even"><td><p><code class="docutils literal notranslate"><span
class="pre">CometBlockStoreShuffleReader</span></code></p></td>
+<td><p><code class="docutils literal notranslate"><span
class="pre">.../shuffle/CometBlockStoreShuffleReader.scala</span></code></p></td>
+<td><p>Fetches shuffle blocks via <code class="docutils literal
notranslate"><span class="pre">ShuffleBlockFetcherIterator</span></code>.
Decodes Arrow IPC to <code class="docutils literal notranslate"><span
class="pre">ColumnarBatch</span></code>.</p></td>
+</tr>
+<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span
class="pre">NativeBatchDecoderIterator</span></code></p></td>
+<td><p><code class="docutils literal notranslate"><span
class="pre">.../shuffle/NativeBatchDecoderIterator.scala</span></code></p></td>
+<td><p>Reads compressed Arrow IPC batches from input stream. Calls <code
class="docutils literal notranslate"><span
class="pre">Native.decodeShuffleBlock()</span></code> via JNI.</p></td>
+</tr>
+</tbody>
+</table>
+</div>
+</section>
+</section>
+<section id="data-flow">
+<h2>Data Flow<a class="headerlink" href="#data-flow" title="Link to this
heading">#</a></h2>
+<section id="write-path">
+<h3>Write Path<a class="headerlink" href="#write-path" title="Link to this
heading">#</a></h3>
+<ol class="arabic simple">
+<li><p><code class="docutils literal notranslate"><span
class="pre">ShuffleWriteProcessor</span></code> calls <code class="docutils
literal notranslate"><span
class="pre">CometShuffleManager.getWriter()</span></code></p></li>
+<li><p>Writer receives <code class="docutils literal notranslate"><span
class="pre">Iterator[Product2[K,</span> <span class="pre">V]]</span></code>
where V is <code class="docutils literal notranslate"><span
class="pre">UnsafeRow</span></code></p></li>
+<li><p>Rows are serialized and buffered in off-heap memory pages</p></li>
+<li><p>When memory threshold or batch size is reached, <code class="docutils
literal notranslate"><span class="pre">SpillWriter.doSpilling()</span></code>
is called</p></li>
+<li><p>Native code (<code class="docutils literal notranslate"><span
class="pre">Native.writeSortedFileNative()</span></code>) converts rows to
Arrow arrays and writes IPC format</p></li>
+<li><p>For bypass writer: partition files are concatenated into final
output</p></li>
+<li><p>For sort writer: spill files are merged</p></li>
+</ol>
+</section>
+<section id="read-path">
+<h3>Read Path<a class="headerlink" href="#read-path" title="Link to this
heading">#</a></h3>
+<ol class="arabic simple">
+<li><p><code class="docutils literal notranslate"><span
class="pre">CometBlockStoreShuffleReader.read()</span></code> creates <code
class="docutils literal notranslate"><span
class="pre">ShuffleBlockFetcherIterator</span></code></p></li>
+<li><p>For each block, <code class="docutils literal notranslate"><span
class="pre">NativeBatchDecoderIterator</span></code> reads the IPC
stream</p></li>
+<li><p>Native code (<code class="docutils literal notranslate"><span
class="pre">Native.decodeShuffleBlock()</span></code>) decompresses and decodes
to Arrow arrays</p></li>
+<li><p>Arrow FFI imports arrays as <code class="docutils literal
notranslate"><span class="pre">ColumnarBatch</span></code></p></li>
+</ol>
+</section>
+</section>
+<section id="memory-management">
+<h2>Memory Management<a class="headerlink" href="#memory-management"
title="Link to this heading">#</a></h2>
+<ul class="simple">
+<li><p><code class="docutils literal notranslate"><span
class="pre">CometShuffleMemoryAllocator</span></code>: Custom allocator for
off-heap memory pages</p></li>
+<li><p>Memory is allocated in pages; when allocation fails, writers spill to
disk</p></li>
+<li><p><code class="docutils literal notranslate"><span
class="pre">CometDiskBlockWriter</span></code> coordinates spilling across all
partition writers (largest first)</p></li>
+<li><p>Async spilling is supported via <code class="docutils literal
notranslate"><span class="pre">ShuffleThreadPool</span></code></p></li>
+</ul>
+</section>
+<section id="configuration">
+<h2>Configuration<a class="headerlink" href="#configuration" title="Link to
this heading">#</a></h2>
+<div class="pst-scrollable-table-container"><table class="table">
+<thead>
+<tr class="row-odd"><th class="head"><p>Config</p></th>
+<th class="head"><p>Description</p></th>
+</tr>
+</thead>
+<tbody>
+<tr class="row-even"><td><p><code class="docutils literal notranslate"><span
class="pre">spark.comet.columnar.shuffle.async.enabled</span></code></p></td>
+<td><p>Enable async spill writes</p></td>
+</tr>
+<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span
class="pre">spark.comet.columnar.shuffle.async.thread.num</span></code></p></td>
+<td><p>Threads per writer for async</p></td>
+</tr>
+<tr class="row-even"><td><p><code class="docutils literal notranslate"><span
class="pre">spark.comet.columnar.shuffle.batch.size</span></code></p></td>
+<td><p>Rows per Arrow batch</p></td>
+</tr>
+<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span
class="pre">spark.comet.columnar.shuffle.spill.threshold</span></code></p></td>
+<td><p>Row count threshold for spill</p></td>
+</tr>
+<tr class="row-even"><td><p><code class="docutils literal notranslate"><span
class="pre">spark.comet.exec.shuffle.compression.codec</span></code></p></td>
+<td><p>Compression codec (zstd, lz4, etc.)</p></td>
+</tr>
+</tbody>
+</table>
+</div>
+</section>
+</section>
+
+
+ </article>
+
+
+
+
+
+ <footer class="prev-next-footer d-print-none">
+
+<div class="prev-next-area">
+ <a class="left-prev"
+ href="ffi.html"
+ title="previous page">
+ <i class="fa-solid fa-angle-left"></i>
+ <div class="prev-next-info">
+ <p class="prev-next-subtitle">previous</p>
+ <p class="prev-next-title">Arrow FFI Usage in Comet</p>
+ </div>
+ </a>
+ <a class="right-next"
+ href="native_shuffle.html"
+ title="next page">
+ <div class="prev-next-info">
+ <p class="prev-next-subtitle">next</p>
+ <p class="prev-next-title">Native Shuffle</p>
+ </div>
+ <i class="fa-solid fa-angle-right"></i>
+ </a>
+</div>
+ </footer>
+
+ </div>
+
+
+
+
+ </div>
+ <footer class="bd-footer-content">
+
+ </footer>
+
+ </main>
+ </div>
+ </div>
+
+ <!-- Scripts loaded after <body> so the DOM is not blocked -->
+ <script defer
src="../_static/scripts/bootstrap.js?digest=8878045cc6db502f8baf"></script>
+<script defer
src="../_static/scripts/pydata-sphinx-theme.js?digest=8878045cc6db502f8baf"></script>
+
+<!-- Based on pydata_sphinx_theme/footer.html -->
+<footer class="footer mt-5 mt-md-0">
+ <div class="container">
+
+ <div class="footer-item">
+ <p>Apache DataFusion, Apache DataFusion Comet, Apache, the Apache
feather logo, and the Apache DataFusion project logo</p>
+ <p>are either registered trademarks or trademarks of The Apache Software
Foundation in the United States and other countries.</p>
+ </div>
+ </div>
+</footer>
+
+
+ </body>
+</html>
\ No newline at end of file
diff --git a/contributor-guide/native_shuffle.html
b/contributor-guide/native_shuffle.html
new file mode 100644
index 000000000..fefb43bb7
--- /dev/null
+++ b/contributor-guide/native_shuffle.html
@@ -0,0 +1,866 @@
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+
+<!DOCTYPE html>
+
+
+<html lang="en" data-content_root="../" data-theme="light">
+
+ <head>
+ <meta charset="utf-8" />
+ <meta name="viewport" content="width=device-width, initial-scale=1.0"
/><meta name="viewport" content="width=device-width, initial-scale=1" />
+
+ <title>Native Shuffle — Apache DataFusion Comet
documentation</title>
+
+
+
+ <script data-cfasync="false">
+ document.documentElement.dataset.mode = localStorage.getItem("mode") ||
"light";
+ document.documentElement.dataset.theme = localStorage.getItem("theme") ||
"light";
+ </script>
+ <!--
+ this give us a css class that will be invisible only if js is disabled
+ -->
+ <noscript>
+ <style>
+ .pst-js-only { display: none !important; }
+
+ </style>
+ </noscript>
+
+ <!-- Loaded before other Sphinx assets -->
+ <link href="../_static/styles/theme.css?digest=8878045cc6db502f8baf"
rel="stylesheet" />
+<link
href="../_static/styles/pydata-sphinx-theme.css?digest=8878045cc6db502f8baf"
rel="stylesheet" />
+
+ <link rel="stylesheet" type="text/css"
href="../_static/pygments.css?v=8f2a1f02" />
+ <link rel="stylesheet" type="text/css"
href="../_static/theme_overrides.css?v=cd442bcd" />
+
+ <!-- So that users can add custom icons -->
+ <script
src="../_static/scripts/fontawesome.js?digest=8878045cc6db502f8baf"></script>
+ <!-- Pre-loaded scripts that we'll load fully later -->
+ <link rel="preload" as="script"
href="../_static/scripts/bootstrap.js?digest=8878045cc6db502f8baf" />
+<link rel="preload" as="script"
href="../_static/scripts/pydata-sphinx-theme.js?digest=8878045cc6db502f8baf" />
+
+ <script src="../_static/documentation_options.js?v=5929fcd5"></script>
+ <script src="../_static/doctools.js?v=9a2dae69"></script>
+ <script src="../_static/sphinx_highlight.js?v=dc90522c"></script>
+ <script>DOCUMENTATION_OPTIONS.pagename =
'contributor-guide/native_shuffle';</script>
+ <script async="true" defer="true"
src="https://buttons.github.io/buttons.js"></script>
+ <link rel="index" title="Index" href="../genindex.html" />
+ <link rel="search" title="Search" href="../search.html" />
+ <link rel="next" title="Comet Parquet Scan Implementations"
href="parquet_scans.html" />
+ <link rel="prev" title="JVM Shuffle" href="jvm_shuffle.html" />
+ <meta name="viewport" content="width=device-width, initial-scale=1"/>
+ <meta name="docsearch:language" content="en"/>
+ <meta name="docsearch:version" content="" />
+ </head>
+
+
+ <body data-bs-spy="scroll" data-bs-target=".bd-toc-nav" data-offset="180"
data-bs-root-margin="0px 0px -60%" data-default-mode="light">
+
+
+
+ <div id="pst-skip-link" class="skip-link d-print-none"><a
href="#main-content">Skip to main content</a></div>
+
+ <div id="pst-scroll-pixel-helper"></div>
+
+ <button type="button" class="btn rounded-pill" id="pst-back-to-top">
+ <i class="fa-solid fa-arrow-up"></i>Back to top</button>
+
+
+ <dialog id="pst-search-dialog">
+
+<form class="bd-search d-flex align-items-center"
+ action="../search.html"
+ method="get">
+ <i class="fa-solid fa-magnifying-glass"></i>
+ <input type="search"
+ class="form-control"
+ name="q"
+ placeholder="Search the docs ..."
+ aria-label="Search the docs ..."
+ autocomplete="off"
+ autocorrect="off"
+ autocapitalize="off"
+ spellcheck="false"/>
+ <span class="search-button__kbd-shortcut"><kbd
class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd>K</kbd></span>
+</form>
+ </dialog>
+
+ <div class="pst-async-banner-revealer d-none">
+ <aside id="bd-header-version-warning" class="d-none d-print-none"
aria-label="Version warning"></aside>
+</div>
+
+
+ <header class="bd-header navbar navbar-expand-lg bd-navbar d-print-none">
+<div class="bd-header__inner bd-page-width">
+ <button class="pst-navbar-icon sidebar-toggle primary-toggle"
aria-label="Site navigation">
+ <span class="fa-solid fa-bars"></span>
+ </button>
+
+
+ <div class="col-lg-3 navbar-header-items__start">
+
+ <div class="navbar-item">
+
+
+
+
+
+<a class="navbar-brand logo" href="../index.html">
+
+
+
+
+
+
+
+
+ <img src="../_static/DataFusionComet-Logo-Light.png" class="logo__image
only-light" alt="Apache DataFusion Comet documentation - Home"/>
+ <img src="../_static/DataFusionComet-Logo-Dark.png" class="logo__image
only-dark pst-js-only" alt="Apache DataFusion Comet documentation - Home"/>
+
+
+</a></div>
+
+ </div>
+
+ <div class="col-lg-9 navbar-header-items">
+
+ <div class="me-auto navbar-header-items__center">
+
+ <div class="navbar-item">
+<nav>
+ <ul class="bd-navbar-elements navbar-nav">
+
+<li class="nav-item ">
+ <a class="nav-link nav-internal" href="../about/index.html">
+ Comet Overview
+ </a>
+</li>
+
+
+<li class="nav-item ">
+ <a class="nav-link nav-internal" href="../user-guide/index.html">
+ User Guide
+ </a>
+</li>
+
+
+<li class="nav-item current active">
+ <a class="nav-link nav-internal" href="index.html">
+ Contributor Guide
+ </a>
+</li>
+
+
+<li class="nav-item ">
+ <a class="nav-link nav-internal" href="../asf/index.html">
+ ASF Links
+ </a>
+</li>
+
+ </ul>
+</nav></div>
+
+ </div>
+
+
+ <div class="navbar-header-items__end">
+
+ <div class="navbar-item navbar-persistent--container">
+
+
+<button class="btn search-button-field search-button__button pst-js-only"
title="Search" aria-label="Search" data-bs-placement="bottom"
data-bs-toggle="tooltip">
+ <i class="fa-solid fa-magnifying-glass"></i>
+ <span class="search-button__default-text">Search</span>
+ <span class="search-button__kbd-shortcut"><kbd
class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd
class="kbd-shortcut__modifier">K</kbd></span>
+</button>
+ </div>
+
+
+ <div class="navbar-item"><ul class="navbar-icon-links"
+ aria-label="Icon Links">
+ <li class="nav-item">
+
+
+
+
+
+
+
+
+ <a href="https://github.com/apache/datafusion-comet" title="GitHub"
class="nav-link pst-navbar-icon" rel="noopener" target="_blank"
data-bs-toggle="tooltip" data-bs-placement="bottom"><i class="fa-brands
fa-github fa-lg" aria-hidden="true"></i>
+ <span class="sr-only">GitHub</span></a>
+ </li>
+</ul></div>
+
+ <div class="navbar-item">
+
+<button class="btn btn-sm nav-link pst-navbar-icon theme-switch-button
pst-js-only" aria-label="Color mode" data-bs-title="Color mode"
data-bs-placement="bottom" data-bs-toggle="tooltip">
+ <i class="theme-switch fa-solid fa-sun fa-lg"
data-mode="light" title="Light"></i>
+ <i class="theme-switch fa-solid fa-moon fa-lg"
data-mode="dark" title="Dark"></i>
+ <i class="theme-switch fa-solid fa-circle-half-stroke fa-lg"
data-mode="auto" title="System Settings"></i>
+</button></div>
+
+ </div>
+
+ </div>
+
+
+ <div class="navbar-persistent--mobile">
+
+<button class="btn search-button-field search-button__button pst-js-only"
title="Search" aria-label="Search" data-bs-placement="bottom"
data-bs-toggle="tooltip">
+ <i class="fa-solid fa-magnifying-glass"></i>
+ <span class="search-button__default-text">Search</span>
+ <span class="search-button__kbd-shortcut"><kbd
class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd
class="kbd-shortcut__modifier">K</kbd></span>
+</button>
+ </div>
+
+
+
+</div>
+
+ </header>
+
+
+ <div class="bd-container">
+ <div class="bd-container__inner bd-page-width">
+
+
+
+ <dialog id="pst-primary-sidebar-modal"></dialog>
+ <div id="pst-primary-sidebar" class="bd-sidebar-primary bd-sidebar">
+
+
+
+ <div class="sidebar-header-items sidebar-primary__section">
+
+
+ <div class="sidebar-header-items__center">
+
+
+
+ <div class="navbar-item">
+<nav>
+ <ul class="bd-navbar-elements navbar-nav">
+
+<li class="nav-item ">
+ <a class="nav-link nav-internal" href="../about/index.html">
+ Comet Overview
+ </a>
+</li>
+
+
+<li class="nav-item ">
+ <a class="nav-link nav-internal" href="../user-guide/index.html">
+ User Guide
+ </a>
+</li>
+
+
+<li class="nav-item current active">
+ <a class="nav-link nav-internal" href="index.html">
+ Contributor Guide
+ </a>
+</li>
+
+
+<li class="nav-item ">
+ <a class="nav-link nav-internal" href="../asf/index.html">
+ ASF Links
+ </a>
+</li>
+
+ </ul>
+</nav></div>
+
+
+ </div>
+
+
+
+ <div class="sidebar-header-items__end">
+
+ <div class="navbar-item"><ul class="navbar-icon-links"
+ aria-label="Icon Links">
+ <li class="nav-item">
+
+
+
+
+
+
+
+
+ <a href="https://github.com/apache/datafusion-comet" title="GitHub"
class="nav-link pst-navbar-icon" rel="noopener" target="_blank"
data-bs-toggle="tooltip" data-bs-placement="bottom"><i class="fa-brands
fa-github fa-lg" aria-hidden="true"></i>
+ <span class="sr-only">GitHub</span></a>
+ </li>
+</ul></div>
+
+ <div class="navbar-item">
+
+<button class="btn btn-sm nav-link pst-navbar-icon theme-switch-button
pst-js-only" aria-label="Color mode" data-bs-title="Color mode"
data-bs-placement="bottom" data-bs-toggle="tooltip">
+ <i class="theme-switch fa-solid fa-sun fa-lg"
data-mode="light" title="Light"></i>
+ <i class="theme-switch fa-solid fa-moon fa-lg"
data-mode="dark" title="Dark"></i>
+ <i class="theme-switch fa-solid fa-circle-half-stroke fa-lg"
data-mode="auto" title="System Settings"></i>
+</button></div>
+
+ </div>
+
+ </div>
+
+ <div class="sidebar-primary-items__start sidebar-primary__section">
+ <div class="sidebar-primary-item"><!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+<nav class="bd-links" id="bd-docs-nav" aria-label="Main navigation">
+ <div class="bd-toc-item active">
+ <p aria-level="2" class="caption" role="heading"><span
class="caption-text">Index</span></p>
+<ul class="current">
+<li class="toctree-l1"><a class="reference internal"
href="../about/index.html">Comet Overview</a></li>
+<li class="toctree-l1"><a class="reference internal"
href="../user-guide/index.html">User Guide</a></li>
+<li class="toctree-l1 current"><a class="reference internal"
href="index.html">Contributor Guide</a><ul class="current">
+<li class="toctree-l2"><a class="reference internal"
href="contributing.html">Getting Started</a></li>
+<li class="toctree-l2"><a class="reference internal"
href="plugin_overview.html">Comet Plugin Architecture</a></li>
+<li class="toctree-l2"><a class="reference internal"
href="plugin_overview.html#plugin-components">Plugin Components</a></li>
+<li class="toctree-l2"><a class="reference internal" href="ffi.html">Arrow
FFI</a></li>
+<li class="toctree-l2"><a class="reference internal"
href="jvm_shuffle.html">JVM Shuffle</a></li>
+<li class="toctree-l2 current"><a class="current reference internal"
href="#">Native Shuffle</a></li>
+<li class="toctree-l2"><a class="reference internal"
href="parquet_scans.html">Parquet Scans</a></li>
+<li class="toctree-l2"><a class="reference internal"
href="development.html">Development Guide</a></li>
+<li class="toctree-l2"><a class="reference internal"
href="debugging.html">Debugging Guide</a></li>
+<li class="toctree-l2"><a class="reference internal"
href="benchmarking.html">Benchmarking Guide</a></li>
+<li class="toctree-l2"><a class="reference internal"
href="adding_a_new_operator.html">Adding a New Operator</a></li>
+<li class="toctree-l2"><a class="reference internal"
href="adding_a_new_expression.html">Adding a New Expression</a></li>
+<li class="toctree-l2"><a class="reference internal"
href="tracing.html">Tracing</a></li>
+<li class="toctree-l2"><a class="reference internal"
href="profiling_native_code.html">Profiling Native Code</a></li>
+<li class="toctree-l2"><a class="reference internal"
href="spark-sql-tests.html">Spark SQL Tests</a></li>
+<li class="toctree-l2"><a class="reference internal"
href="roadmap.html">Roadmap</a></li>
+<li class="toctree-l2"><a class="reference external"
href="https://github.com/apache/datafusion-comet">Github and Issue
Tracker</a></li>
+</ul>
+</li>
+<li class="toctree-l1"><a class="reference internal"
href="../asf/index.html">ASF Links</a></li>
+</ul>
+
+ </div>
+</nav>
+</div>
+ </div>
+
+
+ <div class="sidebar-primary-items__end sidebar-primary__section">
+ <div class="sidebar-primary-item">
+<div id="ethical-ad-placement"
+ class="flat"
+ data-ea-publisher="readthedocs"
+ data-ea-type="readthedocs-sidebar"
+ data-ea-manual="true">
+</div></div>
+ </div>
+
+
+ </div>
+
+ <main id="main-content" class="bd-main" role="main">
+
+
+ <div class="bd-content">
+ <div class="bd-article-container">
+
+ <div class="bd-header-article d-print-none">
+<div class="header-article-items header-article__inner">
+
+ <div class="header-article-items__start">
+
+ <div class="header-article-item">
+
+<nav aria-label="Breadcrumb" class="d-print-none">
+ <ul class="bd-breadcrumbs">
+
+ <li class="breadcrumb-item breadcrumb-home">
+ <a href="../index.html" class="nav-link" aria-label="Home">
+ <i class="fa-solid fa-home"></i>
+ </a>
+ </li>
+
+ <li class="breadcrumb-item"><a href="index.html" class="nav-link">Comet
Contributor Guide</a></li>
+
+ <li class="breadcrumb-item active" aria-current="page"><span
class="ellipsis">Native Shuffle</span></li>
+ </ul>
+</nav>
+</div>
+
+ </div>
+
+
+</div>
+</div>
+
+
+
+
+<div id="searchbox"></div>
+ <article class="bd-article">
+
+ <!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+<section id="native-shuffle">
+<h1>Native Shuffle<a class="headerlink" href="#native-shuffle" title="Link to
this heading">#</a></h1>
+<p>This document describes Comet’s native shuffle implementation (<code
class="docutils literal notranslate"><span
class="pre">CometNativeShuffle</span></code>), which performs
+shuffle operations entirely in Rust code for maximum performance. For the
JVM-based alternative,
+see <a class="reference internal" href="jvm_shuffle.html"><span class="std
std-doc">JVM Shuffle</span></a>.</p>
+<section id="overview">
+<h2>Overview<a class="headerlink" href="#overview" title="Link to this
heading">#</a></h2>
+<p>Native shuffle takes columnar input directly from Comet native operators
and performs partitioning,
+encoding, and writing in native Rust code. This avoids the
columnar-to-row-to-columnar conversion
+overhead that JVM shuffle incurs.</p>
+<div class="highlight-default notranslate"><div
class="highlight"><pre><span></span>Comet Native (columnar) → Native Shuffle →
Arrow IPC → columnar
+</pre></div>
+</div>
+<p>Compare this to JVM shuffle’s data path:</p>
+<div class="highlight-default notranslate"><div
class="highlight"><pre><span></span>Comet Native (columnar) → ColumnarToRowExec
→ rows → JVM Shuffle → Arrow IPC → columnar
+</pre></div>
+</div>
+</section>
+<section id="when-native-shuffle-is-used">
+<h2>When Native Shuffle is Used<a class="headerlink"
href="#when-native-shuffle-is-used" title="Link to this heading">#</a></h2>
+<p>Native shuffle (<code class="docutils literal notranslate"><span
class="pre">CometExchange</span></code>) is selected when all of the following
conditions are met:</p>
+<ol class="arabic">
+<li><p><strong>Shuffle mode allows native</strong>: <code class="docutils
literal notranslate"><span
class="pre">spark.comet.exec.shuffle.mode</span></code> is <code
class="docutils literal notranslate"><span class="pre">native</span></code> or
<code class="docutils literal notranslate"><span
class="pre">auto</span></code>.</p></li>
+<li><p><strong>Child plan is a Comet native operator</strong>: The child must
be a <code class="docutils literal notranslate"><span
class="pre">CometPlan</span></code> that produces
+columnar output. Row-based Spark operators require JVM shuffle.</p></li>
+<li><p><strong>Supported partitioning type</strong>: Native shuffle
supports:</p>
+<ul class="simple">
+<li><p><code class="docutils literal notranslate"><span
class="pre">HashPartitioning</span></code></p></li>
+<li><p><code class="docutils literal notranslate"><span
class="pre">RangePartitioning</span></code></p></li>
+<li><p><code class="docutils literal notranslate"><span
class="pre">SinglePartition</span></code></p></li>
+</ul>
+<p><code class="docutils literal notranslate"><span
class="pre">RoundRobinPartitioning</span></code> requires JVM shuffle.</p>
+</li>
+<li><p><strong>Supported partition key types</strong>: For <code
class="docutils literal notranslate"><span
class="pre">HashPartitioning</span></code> and <code class="docutils literal
notranslate"><span class="pre">RangePartitioning</span></code>, partition
+keys must be primitive types. Complex types (struct, array, map) as partition
keys require
+JVM shuffle. Note that complex types are fully supported as data columns in
native shuffle.</p></li>
+</ol>
+</section>
+<section id="architecture">
+<h2>Architecture<a class="headerlink" href="#architecture" title="Link to this
heading">#</a></h2>
+<div class="highlight-default notranslate"><div
class="highlight"><pre><span></span>┌─────────────────────────────────────────────────────────────────────────────┐
+│ CometShuffleManager
│
+│ - Routes to CometNativeShuffleWriter for CometNativeShuffleHandle
│
+└─────────────────────────────────────────────────────────────────────────────┘
+ │
+ ▼
+┌─────────────────────────────────────────────────────────────────────────────┐
+│ CometNativeShuffleWriter
│
+│ - Constructs protobuf operator plan
│
+│ - Invokes native execution via CometExec.getCometIterator()
│
+└─────────────────────────────────────────────────────────────────────────────┘
+ │
+ ▼ (JNI)
+┌─────────────────────────────────────────────────────────────────────────────┐
+│ ShuffleWriterExec (Rust)
│
+│ - DataFusion ExecutionPlan
│
+│ - Orchestrates partitioning and writing
│
+└─────────────────────────────────────────────────────────────────────────────┘
+ │ │
+ ▼ ▼
+┌───────────────────────────────────┐ ┌───────────────────────────────────┐
+│ MultiPartitionShuffleRepartitioner │ │ SinglePartitionShufflePartitioner │
+│ (hash/range partitioning) │ │ (single partition case) │
+└───────────────────────────────────┘ └───────────────────────────────────┘
+ │
+ ▼
+┌───────────────────────────────────┐
+│ ShuffleBlockWriter │
+│ (Arrow IPC + compression) │
+└───────────────────────────────────┘
+ │
+ ▼
+ ┌─────────────────┐
+ │ Data + Index │
+ │ Files │
+ └─────────────────┘
+</pre></div>
+</div>
+</section>
+<section id="key-classes">
+<h2>Key Classes<a class="headerlink" href="#key-classes" title="Link to this
heading">#</a></h2>
+<section id="scala-side">
+<h3>Scala Side<a class="headerlink" href="#scala-side" title="Link to this
heading">#</a></h3>
+<div class="pst-scrollable-table-container"><table class="table">
+<thead>
+<tr class="row-odd"><th class="head"><p>Class</p></th>
+<th class="head"><p>Location</p></th>
+<th class="head"><p>Description</p></th>
+</tr>
+</thead>
+<tbody>
+<tr class="row-even"><td><p><code class="docutils literal notranslate"><span
class="pre">CometShuffleExchangeExec</span></code></p></td>
+<td><p><code class="docutils literal notranslate"><span
class="pre">.../shuffle/CometShuffleExchangeExec.scala</span></code></p></td>
+<td><p>Physical plan node. Validates types and partitioning, creates <code
class="docutils literal notranslate"><span
class="pre">CometShuffleDependency</span></code>.</p></td>
+</tr>
+<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span
class="pre">CometNativeShuffleWriter</span></code></p></td>
+<td><p><code class="docutils literal notranslate"><span
class="pre">.../shuffle/CometNativeShuffleWriter.scala</span></code></p></td>
+<td><p>Implements <code class="docutils literal notranslate"><span
class="pre">ShuffleWriter</span></code>. Builds protobuf plan and invokes
native execution.</p></td>
+</tr>
+<tr class="row-even"><td><p><code class="docutils literal notranslate"><span
class="pre">CometShuffleDependency</span></code></p></td>
+<td><p><code class="docutils literal notranslate"><span
class="pre">.../shuffle/CometShuffleDependency.scala</span></code></p></td>
+<td><p>Extends <code class="docutils literal notranslate"><span
class="pre">ShuffleDependency</span></code>. Holds shuffle type, schema, and
range partition bounds.</p></td>
+</tr>
+<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span
class="pre">CometBlockStoreShuffleReader</span></code></p></td>
+<td><p><code class="docutils literal notranslate"><span
class="pre">.../shuffle/CometBlockStoreShuffleReader.scala</span></code></p></td>
+<td><p>Reads shuffle blocks via <code class="docutils literal
notranslate"><span class="pre">ShuffleBlockFetcherIterator</span></code>.
Decodes Arrow IPC to <code class="docutils literal notranslate"><span
class="pre">ColumnarBatch</span></code>.</p></td>
+</tr>
+<tr class="row-even"><td><p><code class="docutils literal notranslate"><span
class="pre">NativeBatchDecoderIterator</span></code></p></td>
+<td><p><code class="docutils literal notranslate"><span
class="pre">.../shuffle/NativeBatchDecoderIterator.scala</span></code></p></td>
+<td><p>Reads compressed Arrow IPC from input stream. Calls native decode via
JNI.</p></td>
+</tr>
+</tbody>
+</table>
+</div>
+</section>
+<section id="rust-side">
+<h3>Rust Side<a class="headerlink" href="#rust-side" title="Link to this
heading">#</a></h3>
+<div class="pst-scrollable-table-container"><table class="table">
+<thead>
+<tr class="row-odd"><th class="head"><p>File</p></th>
+<th class="head"><p>Location</p></th>
+<th class="head"><p>Description</p></th>
+</tr>
+</thead>
+<tbody>
+<tr class="row-even"><td><p><code class="docutils literal notranslate"><span
class="pre">shuffle_writer.rs</span></code></p></td>
+<td><p><code class="docutils literal notranslate"><span
class="pre">native/core/src/execution/shuffle/</span></code></p></td>
+<td><p><code class="docutils literal notranslate"><span
class="pre">ShuffleWriterExec</span></code> plan and partitioners. Main shuffle
logic.</p></td>
+</tr>
+<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span
class="pre">codec.rs</span></code></p></td>
+<td><p><code class="docutils literal notranslate"><span
class="pre">native/core/src/execution/shuffle/</span></code></p></td>
+<td><p><code class="docutils literal notranslate"><span
class="pre">ShuffleBlockWriter</span></code> for Arrow IPC encoding with
compression. Also handles decoding.</p></td>
+</tr>
+<tr class="row-even"><td><p><code class="docutils literal notranslate"><span
class="pre">comet_partitioning.rs</span></code></p></td>
+<td><p><code class="docutils literal notranslate"><span
class="pre">native/core/src/execution/shuffle/</span></code></p></td>
+<td><p><code class="docutils literal notranslate"><span
class="pre">CometPartitioning</span></code> enum defining partition schemes
(Hash, Range, Single).</p></td>
+</tr>
+</tbody>
+</table>
+</div>
+</section>
+</section>
+<section id="data-flow">
+<h2>Data Flow<a class="headerlink" href="#data-flow" title="Link to this
heading">#</a></h2>
+<section id="write-path">
+<h3>Write Path<a class="headerlink" href="#write-path" title="Link to this
heading">#</a></h3>
+<ol class="arabic simple">
+<li><p><strong>Plan construction</strong>: <code class="docutils literal
notranslate"><span class="pre">CometNativeShuffleWriter</span></code> builds a
protobuf operator plan containing:</p>
+<ul class="simple">
+<li><p>A scan operator reading from the input iterator</p></li>
+<li><p>A <code class="docutils literal notranslate"><span
class="pre">ShuffleWriter</span></code> operator with partitioning config and
compression codec</p></li>
+</ul>
+</li>
+<li><p><strong>Native execution</strong>: <code class="docutils literal
notranslate"><span class="pre">CometExec.getCometIterator()</span></code>
executes the plan in Rust.</p></li>
+<li><p><strong>Partitioning</strong>: <code class="docutils literal
notranslate"><span class="pre">ShuffleWriterExec</span></code> receives batches
and routes to the appropriate partitioner:</p>
+<ul class="simple">
+<li><p><code class="docutils literal notranslate"><span
class="pre">MultiPartitionShuffleRepartitioner</span></code>: For hash/range
partitioning</p></li>
+<li><p><code class="docutils literal notranslate"><span
class="pre">SinglePartitionShufflePartitioner</span></code>: For single
partition (simpler path)</p></li>
+</ul>
+</li>
+<li><p><strong>Buffering and spilling</strong>: The partitioner buffers rows
per partition. When memory pressure
+exceeds the threshold, partitions spill to temporary files.</p></li>
+<li><p><strong>Encoding</strong>: <code class="docutils literal
notranslate"><span class="pre">ShuffleBlockWriter</span></code> encodes each
partition’s data as compressed Arrow IPC:</p>
+<ul class="simple">
+<li><p>Writes compression type header</p></li>
+<li><p>Writes field count header</p></li>
+<li><p>Writes compressed IPC stream</p></li>
+</ul>
+</li>
+<li><p><strong>Output files</strong>: Two files are produced:</p>
+<ul class="simple">
+<li><p><strong>Data file</strong>: Concatenated partition data</p></li>
+<li><p><strong>Index file</strong>: Array of 8-byte little-endian offsets
marking partition boundaries</p></li>
+</ul>
+</li>
+<li><p><strong>Commit</strong>: Back in JVM, <code class="docutils literal
notranslate"><span class="pre">CometNativeShuffleWriter</span></code> reads the
index file to get partition
+lengths and commits via Spark’s <code class="docutils literal
notranslate"><span class="pre">IndexShuffleBlockResolver</span></code>.</p></li>
+</ol>
+</section>
+<section id="read-path">
+<h3>Read Path<a class="headerlink" href="#read-path" title="Link to this
heading">#</a></h3>
+<ol class="arabic simple">
+<li><p><code class="docutils literal notranslate"><span
class="pre">CometBlockStoreShuffleReader</span></code> fetches shuffle blocks
via <code class="docutils literal notranslate"><span
class="pre">ShuffleBlockFetcherIterator</span></code>.</p></li>
+<li><p>For each block, <code class="docutils literal notranslate"><span
class="pre">NativeBatchDecoderIterator</span></code>:</p>
+<ul class="simple">
+<li><p>Reads the 8-byte compressed length header</p></li>
+<li><p>Reads the 8-byte field count header</p></li>
+<li><p>Reads the compressed IPC data</p></li>
+<li><p>Calls <code class="docutils literal notranslate"><span
class="pre">Native.decodeShuffleBlock()</span></code> via JNI</p></li>
+</ul>
+</li>
+<li><p>Native code decompresses and deserializes the Arrow IPC stream.</p></li>
+<li><p>Arrow FFI transfers the <code class="docutils literal
notranslate"><span class="pre">RecordBatch</span></code> to JVM as a <code
class="docutils literal notranslate"><span
class="pre">ColumnarBatch</span></code>.</p></li>
+</ol>
+</section>
+</section>
+<section id="partitioning">
+<h2>Partitioning<a class="headerlink" href="#partitioning" title="Link to this
heading">#</a></h2>
+<section id="hash-partitioning">
+<h3>Hash Partitioning<a class="headerlink" href="#hash-partitioning"
title="Link to this heading">#</a></h3>
+<p>Native shuffle implements Spark-compatible hash partitioning:</p>
+<ul class="simple">
+<li><p>Uses Murmur3 hash function with seed 42 (matching Spark)</p></li>
+<li><p>Computes hash of partition key columns</p></li>
+<li><p>Applies modulo by partition count: <code class="docutils literal
notranslate"><span class="pre">partition_id</span> <span class="pre">=</span>
<span class="pre">hash</span> <span class="pre">%</span> <span
class="pre">num_partitions</span></code></p></li>
+</ul>
+</section>
+<section id="range-partitioning">
+<h3>Range Partitioning<a class="headerlink" href="#range-partitioning"
title="Link to this heading">#</a></h3>
+<p>For range partitioning:</p>
+<ol class="arabic simple">
+<li><p>Spark’s <code class="docutils literal notranslate"><span
class="pre">RangePartitioner</span></code> samples data and computes partition
boundaries on the driver.</p></li>
+<li><p>Boundaries are serialized to the native plan.</p></li>
+<li><p>Native code converts sort key columns to comparable row format.</p></li>
+<li><p>Binary search (<code class="docutils literal notranslate"><span
class="pre">partition_point</span></code>) determines which partition each row
belongs to.</p></li>
+</ol>
+</section>
+<section id="single-partition">
+<h3>Single Partition<a class="headerlink" href="#single-partition" title="Link
to this heading">#</a></h3>
+<p>The simplest case: all rows go to partition 0. Uses <code class="docutils
literal notranslate"><span
class="pre">SinglePartitionShufflePartitioner</span></code> which
+simply concatenates batches to reach the configured batch size.</p>
+</section>
+</section>
+<section id="memory-management">
+<h2>Memory Management<a class="headerlink" href="#memory-management"
title="Link to this heading">#</a></h2>
+<p>Native shuffle uses DataFusion’s memory management with spilling
support:</p>
+<ul class="simple">
+<li><p><strong>Memory pool</strong>: Tracks memory usage across the shuffle
operation.</p></li>
+<li><p><strong>Spill threshold</strong>: When buffered data exceeds the
threshold, partitions spill to disk.</p></li>
+<li><p><strong>Per-partition spilling</strong>: Each partition has its own
spill file. Multiple spills for a
+partition are concatenated when writing the final output.</p></li>
+<li><p><strong>Scratch space</strong>: Reusable buffers for partition ID
computation to reduce allocations.</p></li>
+</ul>
+<p>The <code class="docutils literal notranslate"><span
class="pre">MultiPartitionShuffleRepartitioner</span></code> manages:</p>
+<ul class="simple">
+<li><p><code class="docutils literal notranslate"><span
class="pre">PartitionBuffer</span></code>: In-memory buffer for each
partition</p></li>
+<li><p><code class="docutils literal notranslate"><span
class="pre">SpillFile</span></code>: Temporary file for spilled data</p></li>
+<li><p>Memory tracking via <code class="docutils literal notranslate"><span
class="pre">MemoryConsumer</span></code> trait</p></li>
+</ul>
+</section>
+<section id="compression">
+<h2>Compression<a class="headerlink" href="#compression" title="Link to this
heading">#</a></h2>
+<p>Native shuffle supports multiple compression codecs configured via
+<code class="docutils literal notranslate"><span
class="pre">spark.comet.exec.shuffle.compression.codec</span></code>:</p>
+<div class="pst-scrollable-table-container"><table class="table">
+<thead>
+<tr class="row-odd"><th class="head"><p>Codec</p></th>
+<th class="head"><p>Description</p></th>
+</tr>
+</thead>
+<tbody>
+<tr class="row-even"><td><p><code class="docutils literal notranslate"><span
class="pre">zstd</span></code></p></td>
+<td><p>Zstandard compression. Best ratio, configurable level.</p></td>
+</tr>
+<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span
class="pre">lz4</span></code></p></td>
+<td><p>LZ4 compression. Fast with good ratio.</p></td>
+</tr>
+<tr class="row-even"><td><p><code class="docutils literal notranslate"><span
class="pre">snappy</span></code></p></td>
+<td><p>Snappy compression. Fastest, lower ratio.</p></td>
+</tr>
+<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span
class="pre">none</span></code></p></td>
+<td><p>No compression.</p></td>
+</tr>
+</tbody>
+</table>
+</div>
+<p>The compression codec is applied uniformly to all partitions. Each
partition’s data is
+independently compressed, allowing parallel decompression during reads.</p>
+</section>
+<section id="configuration">
+<h2>Configuration<a class="headerlink" href="#configuration" title="Link to
this heading">#</a></h2>
+<div class="pst-scrollable-table-container"><table class="table">
+<thead>
+<tr class="row-odd"><th class="head"><p>Config</p></th>
+<th class="head"><p>Default</p></th>
+<th class="head"><p>Description</p></th>
+</tr>
+</thead>
+<tbody>
+<tr class="row-even"><td><p><code class="docutils literal notranslate"><span
class="pre">spark.comet.exec.shuffle.enabled</span></code></p></td>
+<td><p><code class="docutils literal notranslate"><span
class="pre">true</span></code></p></td>
+<td><p>Enable Comet shuffle</p></td>
+</tr>
+<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span
class="pre">spark.comet.exec.shuffle.mode</span></code></p></td>
+<td><p><code class="docutils literal notranslate"><span
class="pre">auto</span></code></p></td>
+<td><p>Shuffle mode: <code class="docutils literal notranslate"><span
class="pre">native</span></code>, <code class="docutils literal
notranslate"><span class="pre">jvm</span></code>, or <code class="docutils
literal notranslate"><span class="pre">auto</span></code></p></td>
+</tr>
+<tr class="row-even"><td><p><code class="docutils literal notranslate"><span
class="pre">spark.comet.exec.shuffle.compression.codec</span></code></p></td>
+<td><p><code class="docutils literal notranslate"><span
class="pre">zstd</span></code></p></td>
+<td><p>Compression codec</p></td>
+</tr>
+<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span
class="pre">spark.comet.exec.shuffle.compression.zstd.level</span></code></p></td>
+<td><p><code class="docutils literal notranslate"><span
class="pre">1</span></code></p></td>
+<td><p>Zstd compression level</p></td>
+</tr>
+<tr class="row-even"><td><p><code class="docutils literal notranslate"><span
class="pre">spark.comet.shuffle.write.buffer.size</span></code></p></td>
+<td><p><code class="docutils literal notranslate"><span
class="pre">1MB</span></code></p></td>
+<td><p>Write buffer size</p></td>
+</tr>
+<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span
class="pre">spark.comet.columnar.shuffle.batch.size</span></code></p></td>
+<td><p><code class="docutils literal notranslate"><span
class="pre">8192</span></code></p></td>
+<td><p>Target rows per batch</p></td>
+</tr>
+</tbody>
+</table>
+</div>
+</section>
+<section id="comparison-with-jvm-shuffle">
+<h2>Comparison with JVM Shuffle<a class="headerlink"
href="#comparison-with-jvm-shuffle" title="Link to this heading">#</a></h2>
+<div class="pst-scrollable-table-container"><table class="table">
+<thead>
+<tr class="row-odd"><th class="head"><p>Aspect</p></th>
+<th class="head"><p>Native Shuffle</p></th>
+<th class="head"><p>JVM Shuffle</p></th>
+</tr>
+</thead>
+<tbody>
+<tr class="row-even"><td><p>Input format</p></td>
+<td><p>Columnar (direct from Comet operators)</p></td>
+<td><p>Row-based (via ColumnarToRowExec)</p></td>
+</tr>
+<tr class="row-odd"><td><p>Partitioning logic</p></td>
+<td><p>Rust implementation</p></td>
+<td><p>Spark’s partitioner</p></td>
+</tr>
+<tr class="row-even"><td><p>Supported schemes</p></td>
+<td><p>Hash, Range, Single</p></td>
+<td><p>Hash, Range, Single, RoundRobin</p></td>
+</tr>
+<tr class="row-odd"><td><p>Partition key types</p></td>
+<td><p>Primitives only</p></td>
+<td><p>Any type</p></td>
+</tr>
+<tr class="row-even"><td><p>Performance</p></td>
+<td><p>Higher (no format conversion)</p></td>
+<td><p>Lower (columnar→row→columnar)</p></td>
+</tr>
+<tr class="row-odd"><td><p>Writer variants</p></td>
+<td><p>Single path</p></td>
+<td><p>Bypass (hash) and sort-based</p></td>
+</tr>
+</tbody>
+</table>
+</div>
+<p>See <a class="reference internal" href="jvm_shuffle.html"><span class="std
std-doc">JVM Shuffle</span></a> for details on the JVM-based implementation.</p>
+</section>
+</section>
+
+
+ </article>
+
+
+
+
+
+ <footer class="prev-next-footer d-print-none">
+
+<div class="prev-next-area">
+ <a class="left-prev"
+ href="jvm_shuffle.html"
+ title="previous page">
+ <i class="fa-solid fa-angle-left"></i>
+ <div class="prev-next-info">
+ <p class="prev-next-subtitle">previous</p>
+ <p class="prev-next-title">JVM Shuffle</p>
+ </div>
+ </a>
+ <a class="right-next"
+ href="parquet_scans.html"
+ title="next page">
+ <div class="prev-next-info">
+ <p class="prev-next-subtitle">next</p>
+ <p class="prev-next-title">Comet Parquet Scan Implementations</p>
+ </div>
+ <i class="fa-solid fa-angle-right"></i>
+ </a>
+</div>
+ </footer>
+
+ </div>
+
+
+
+
+ </div>
+ <footer class="bd-footer-content">
+
+ </footer>
+
+ </main>
+ </div>
+ </div>
+
+ <!-- Scripts loaded after <body> so the DOM is not blocked -->
+ <script defer
src="../_static/scripts/bootstrap.js?digest=8878045cc6db502f8baf"></script>
+<script defer
src="../_static/scripts/pydata-sphinx-theme.js?digest=8878045cc6db502f8baf"></script>
+
+<!-- Based on pydata_sphinx_theme/footer.html -->
+<footer class="footer mt-5 mt-md-0">
+ <div class="container">
+
+ <div class="footer-item">
+ <p>Apache DataFusion, Apache DataFusion Comet, Apache, the Apache
feather logo, and the Apache DataFusion project logo</p>
+ <p>are either registered trademarks or trademarks of The Apache Software
Foundation in the United States and other countries.</p>
+ </div>
+ </div>
+</footer>
+
+
+ </body>
+</html>
\ No newline at end of file
diff --git a/contributor-guide/parquet_scans.html
b/contributor-guide/parquet_scans.html
index 2fcd4789d..c31994090 100644
--- a/contributor-guide/parquet_scans.html
+++ b/contributor-guide/parquet_scans.html
@@ -66,7 +66,7 @@ under the License.
<link rel="index" title="Index" href="../genindex.html" />
<link rel="search" title="Search" href="../search.html" />
<link rel="next" title="Comet Development Guide" href="development.html" />
- <link rel="prev" title="Arrow FFI Usage in Comet" href="ffi.html" />
+ <link rel="prev" title="Native Shuffle" href="native_shuffle.html" />
<meta name="viewport" content="width=device-width, initial-scale=1"/>
<meta name="docsearch:language" content="en"/>
<meta name="docsearch:version" content="" />
@@ -357,6 +357,8 @@ under the License.
<li class="toctree-l2"><a class="reference internal"
href="plugin_overview.html">Comet Plugin Architecture</a></li>
<li class="toctree-l2"><a class="reference internal"
href="plugin_overview.html#plugin-components">Plugin Components</a></li>
<li class="toctree-l2"><a class="reference internal" href="ffi.html">Arrow
FFI</a></li>
+<li class="toctree-l2"><a class="reference internal"
href="jvm_shuffle.html">JVM Shuffle</a></li>
+<li class="toctree-l2"><a class="reference internal"
href="native_shuffle.html">Native Shuffle</a></li>
<li class="toctree-l2 current"><a class="current reference internal"
href="#">Parquet Scans</a></li>
<li class="toctree-l2"><a class="reference internal"
href="development.html">Development Guide</a></li>
<li class="toctree-l2"><a class="reference internal"
href="debugging.html">Debugging Guide</a></li>
@@ -596,12 +598,12 @@ continue to work as long as the configurations are
supported and can be translat
<div class="prev-next-area">
<a class="left-prev"
- href="ffi.html"
+ href="native_shuffle.html"
title="previous page">
<i class="fa-solid fa-angle-left"></i>
<div class="prev-next-info">
<p class="prev-next-subtitle">previous</p>
- <p class="prev-next-title">Arrow FFI Usage in Comet</p>
+ <p class="prev-next-title">Native Shuffle</p>
</div>
</a>
<a class="right-next"
diff --git a/contributor-guide/plugin_overview.html
b/contributor-guide/plugin_overview.html
index 59a6ace09..eefdc627e 100644
--- a/contributor-guide/plugin_overview.html
+++ b/contributor-guide/plugin_overview.html
@@ -357,6 +357,8 @@ under the License.
<li class="toctree-l2 current"><a class="current reference internal"
href="#">Comet Plugin Architecture</a></li>
<li class="toctree-l2"><a class="reference internal"
href="#plugin-components">Plugin Components</a></li>
<li class="toctree-l2"><a class="reference internal" href="ffi.html">Arrow
FFI</a></li>
+<li class="toctree-l2"><a class="reference internal"
href="jvm_shuffle.html">JVM Shuffle</a></li>
+<li class="toctree-l2"><a class="reference internal"
href="native_shuffle.html">Native Shuffle</a></li>
<li class="toctree-l2"><a class="reference internal"
href="parquet_scans.html">Parquet Scans</a></li>
<li class="toctree-l2"><a class="reference internal"
href="development.html">Development Guide</a></li>
<li class="toctree-l2"><a class="reference internal"
href="debugging.html">Debugging Guide</a></li>
diff --git a/contributor-guide/profiling_native_code.html
b/contributor-guide/profiling_native_code.html
index c22b13def..8d075bcab 100644
--- a/contributor-guide/profiling_native_code.html
+++ b/contributor-guide/profiling_native_code.html
@@ -357,6 +357,8 @@ under the License.
<li class="toctree-l2"><a class="reference internal"
href="plugin_overview.html">Comet Plugin Architecture</a></li>
<li class="toctree-l2"><a class="reference internal"
href="plugin_overview.html#plugin-components">Plugin Components</a></li>
<li class="toctree-l2"><a class="reference internal" href="ffi.html">Arrow
FFI</a></li>
+<li class="toctree-l2"><a class="reference internal"
href="jvm_shuffle.html">JVM Shuffle</a></li>
+<li class="toctree-l2"><a class="reference internal"
href="native_shuffle.html">Native Shuffle</a></li>
<li class="toctree-l2"><a class="reference internal"
href="parquet_scans.html">Parquet Scans</a></li>
<li class="toctree-l2"><a class="reference internal"
href="development.html">Development Guide</a></li>
<li class="toctree-l2"><a class="reference internal"
href="debugging.html">Debugging Guide</a></li>
diff --git a/contributor-guide/roadmap.html b/contributor-guide/roadmap.html
index 19c2c4a60..a55d609bd 100644
--- a/contributor-guide/roadmap.html
+++ b/contributor-guide/roadmap.html
@@ -357,6 +357,8 @@ under the License.
<li class="toctree-l2"><a class="reference internal"
href="plugin_overview.html">Comet Plugin Architecture</a></li>
<li class="toctree-l2"><a class="reference internal"
href="plugin_overview.html#plugin-components">Plugin Components</a></li>
<li class="toctree-l2"><a class="reference internal" href="ffi.html">Arrow
FFI</a></li>
+<li class="toctree-l2"><a class="reference internal"
href="jvm_shuffle.html">JVM Shuffle</a></li>
+<li class="toctree-l2"><a class="reference internal"
href="native_shuffle.html">Native Shuffle</a></li>
<li class="toctree-l2"><a class="reference internal"
href="parquet_scans.html">Parquet Scans</a></li>
<li class="toctree-l2"><a class="reference internal"
href="development.html">Development Guide</a></li>
<li class="toctree-l2"><a class="reference internal"
href="debugging.html">Debugging Guide</a></li>
diff --git a/contributor-guide/spark-sql-tests.html
b/contributor-guide/spark-sql-tests.html
index 7751b345d..2e3210ec2 100644
--- a/contributor-guide/spark-sql-tests.html
+++ b/contributor-guide/spark-sql-tests.html
@@ -357,6 +357,8 @@ under the License.
<li class="toctree-l2"><a class="reference internal"
href="plugin_overview.html">Comet Plugin Architecture</a></li>
<li class="toctree-l2"><a class="reference internal"
href="plugin_overview.html#plugin-components">Plugin Components</a></li>
<li class="toctree-l2"><a class="reference internal" href="ffi.html">Arrow
FFI</a></li>
+<li class="toctree-l2"><a class="reference internal"
href="jvm_shuffle.html">JVM Shuffle</a></li>
+<li class="toctree-l2"><a class="reference internal"
href="native_shuffle.html">Native Shuffle</a></li>
<li class="toctree-l2"><a class="reference internal"
href="parquet_scans.html">Parquet Scans</a></li>
<li class="toctree-l2"><a class="reference internal"
href="development.html">Development Guide</a></li>
<li class="toctree-l2"><a class="reference internal"
href="debugging.html">Debugging Guide</a></li>
diff --git a/contributor-guide/tracing.html b/contributor-guide/tracing.html
index 04c6923ec..8bf041684 100644
--- a/contributor-guide/tracing.html
+++ b/contributor-guide/tracing.html
@@ -357,6 +357,8 @@ under the License.
<li class="toctree-l2"><a class="reference internal"
href="plugin_overview.html">Comet Plugin Architecture</a></li>
<li class="toctree-l2"><a class="reference internal"
href="plugin_overview.html#plugin-components">Plugin Components</a></li>
<li class="toctree-l2"><a class="reference internal" href="ffi.html">Arrow
FFI</a></li>
+<li class="toctree-l2"><a class="reference internal"
href="jvm_shuffle.html">JVM Shuffle</a></li>
+<li class="toctree-l2"><a class="reference internal"
href="native_shuffle.html">Native Shuffle</a></li>
<li class="toctree-l2"><a class="reference internal"
href="parquet_scans.html">Parquet Scans</a></li>
<li class="toctree-l2"><a class="reference internal"
href="development.html">Development Guide</a></li>
<li class="toctree-l2"><a class="reference internal"
href="debugging.html">Debugging Guide</a></li>
diff --git a/objects.inv b/objects.inv
index 81bb2886f..dbb005c42 100644
Binary files a/objects.inv and b/objects.inv differ
diff --git a/searchindex.js b/searchindex.js
index cd3cf6df9..602adbca9 100644
--- a/searchindex.js
+++ b/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Install Comet": [[19, "install-comet"]],
"1. Native Operators (nativeExecs map)": [[4,
"native-operators-nativeexecs-map"]], "2. Clone Spark and Apply Diff": [[19,
"clone-spark-and-apply-diff"]], "2. Sink Operators (sinks map)": [[4,
"sink-operators-sinks-map"]], "3. Comet JVM Operators": [[4,
"comet-jvm-operators"]], "3. Run Spark SQL Tests": [[19,
"run-spark-sql-tests"]], "ANSI Mode": [[22, "ansi-mode"], [35, "ansi-mode"],
[48, "ansi-mode"], [88, "ans [...]
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Install Comet": [[21, "install-comet"]],
"1. Native Operators (nativeExecs map)": [[4,
"native-operators-nativeexecs-map"]], "2. Clone Spark and Apply Diff": [[21,
"clone-spark-and-apply-diff"]], "2. Sink Operators (sinks map)": [[4,
"sink-operators-sinks-map"]], "3. Comet JVM Operators": [[4,
"comet-jvm-operators"]], "3. Run Spark SQL Tests": [[21,
"run-spark-sql-tests"]], "ANSI Mode": [[24, "ansi-mode"], [37, "ansi-mode"],
[50, "ansi-mode"], [90, "ans [...]
\ No newline at end of file
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]