(datafusion-comet) branch asf-site updated: Publish built docs triggered by dba523d994f3f8336d2c5ca469c61672768611a1

github-bot Mon, 03 Nov 2025 12:53:27 -0800

This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/datafusion-comet.git



The following commit(s) were added to refs/heads/asf-site by this push:
     new 7339c4b23 Publish built docs triggered by 
dba523d994f3f8336d2c5ca469c61672768611a1
7339c4b23 is described below

commit 7339c4b23e22db6c1d30d1bc758e2a93df2f234c
Author: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
AuthorDate: Mon Nov 3 20:53:09 2025 +0000

    Publish built docs triggered by dba523d994f3f8336d2c5ca469c61672768611a1
---
 _sources/contributor-guide/index.md.txt            |   1 +
 _sources/contributor-guide/parquet_scans.md.txt    | 137 ++++++++++++++++
 _sources/user-guide/latest/compatibility.md.txt    |  68 +-------
 _sources/user-guide/latest/datasources.md.txt      |  72 +--------
 contributor-guide/adding_a_new_expression.html     |   1 +
 contributor-guide/benchmarking.html                |   1 +
 contributor-guide/contributing.html                |   1 +
 contributor-guide/debugging.html                   |   1 +
 contributor-guide/development.html                 |   7 +-
 contributor-guide/ffi.html                         |   7 +-
 contributor-guide/index.html                       |   5 +
 .../{tracing.html => parquet_scans.html}           | 177 +++++++++++++++------
 contributor-guide/plugin_overview.html             |   1 +
 contributor-guide/profiling_native_code.html       |   1 +
 contributor-guide/roadmap.html                     |   1 +
 contributor-guide/spark-sql-tests.html             |   1 +
 contributor-guide/tracing.html                     |   1 +
 objects.inv                                        | Bin 1486 -> 1509 bytes
 searchindex.js                                     |   2 +-
 user-guide/latest/compatibility.html               |  80 +---------
 user-guide/latest/datasources.html                 |  81 +---------
 21 files changed, 315 insertions(+), 331 deletions(-)

diff --git a/_sources/contributor-guide/index.md.txt 
b/_sources/contributor-guide/index.md.txt
index ba4692a97..eb79f7ab5 100644
--- a/_sources/contributor-guide/index.md.txt
+++ b/_sources/contributor-guide/index.md.txt
@@ -26,6 +26,7 @@ under the License.
 Getting Started <contributing>
 Comet Plugin Overview <plugin_overview>
 Arrow FFI <ffi>
+Parquet Scans <parquet_scans>
 Development Guide <development>
 Debugging Guide <debugging>
 Benchmarking Guide <benchmarking>
diff --git a/_sources/contributor-guide/parquet_scans.md.txt 
b/_sources/contributor-guide/parquet_scans.md.txt
new file mode 100644
index 000000000..4aec9f347
--- /dev/null
+++ b/_sources/contributor-guide/parquet_scans.md.txt
@@ -0,0 +1,137 @@
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Comet Parquet Scan Implementations
+
+Comet currently has three distinct implementations of the Parquet scan 
operator. The configuration property
+`spark.comet.scan.impl` is used to select an implementation. The default 
setting is `spark.comet.scan.impl=auto`, and
+Comet will choose the most appropriate implementation based on the Parquet 
schema and other Comet configuration
+settings. Most users should not need to change this setting. However, it is 
possible to force Comet to try and use
+a particular implementation for all scan operations by setting this 
configuration property to one of the following
+implementations.
+
+| Implementation          | Description                                        
                                                                                
                                                  |
+| ----------------------- | 
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 |
+| `native_comet`          | This implementation provides strong compatibility 
with Spark but does not support complex types. This is the original scan 
implementation in Comet and may eventually be removed.    |
+| `native_iceberg_compat` | This implementation delegates to DataFusion's 
`DataSourceExec` but uses a hybrid approach of JVM and native code. This scan 
is designed to be integrated with Iceberg in the future. |
+| `native_datafusion`     | This experimental implementation delegates to 
DataFusion's `DataSourceExec` for full native execution. There are known 
compatibility issues when using this scan.                    |
+
+The `native_datafusion` and `native_iceberg_compat` scans provide the 
following benefits over the `native_comet`
+implementation:
+
+- Leverages the DataFusion community's ongoing improvements to `DataSourceExec`
+- Provides support for reading complex types (structs, arrays, and maps)
+- Removes the use of reusable mutable-buffers in Comet, which is complex to 
maintain
+- Improves performance
+
+The `native_datafusion` and `native_iceberg_compat` scans share the following 
limitations:
+
+- When reading Parquet files written by systems other than Spark that contain 
columns with the logical types `UINT_8`
+  or `UINT_16`, Comet will produce different results than Spark because Spark 
does not preserve or understand these
+  logical types. Arrow-based readers, such as DataFusion and Comet do respect 
these types and read the data as unsigned
+  rather than signed. By default, Comet will fall back to `native_comet` when 
scanning Parquet files containing `byte` or `short`
+  types (regardless of the logical type). This behavior can be disabled by 
setting
+  `spark.comet.scan.allowIncompatible=true`.
+- No support for default values that are nested types (e.g., maps, arrays, 
structs). Literal default values are supported.
+
+The `native_datafusion` scan has some additional limitations:
+
+- Bucketed scans are not supported
+- No support for row indexes
+- `PARQUET_FIELD_ID_READ_ENABLED` is not respected [#1758]
+- There are failures in the Spark SQL test suite [#1545]
+- Setting Spark configs `ignoreMissingFiles` or `ignoreCorruptFiles` to `true` 
is not compatible with Spark
+
+## S3 Support
+
+There are some 
+
+### `native_comet`
+
+The default `native_comet` Parquet scan implementation reads data from S3 
using the [Hadoop-AWS 
module](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html),
 which 
+is identical to the approach commonly used with vanilla Spark. AWS credential 
configuration and other Hadoop S3A 
+configurations works the same way as in vanilla Spark.
+
+### `native_datafusion` and `native_iceberg_compat`
+
+The `native_datafusion` and `native_iceberg_compat` Parquet scan 
implementations completely offload data loading 
+to native code. They use the [`object_store` 
crate](https://crates.io/crates/object_store) to read data from S3 and 
+support configuring S3 access using standard [Hadoop S3A 
configurations](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#General_S3A_Client_configuration)
 by translating them to 
+the `object_store` crate's format.
+
+This implementation maintains compatibility with existing Hadoop S3A 
configurations, so existing code will 
+continue to work as long as the configurations are supported and can be 
translated without loss of functionality.
+
+#### Additional S3 Configuration Options
+
+Beyond credential providers, the `native_datafusion` implementation supports 
additional S3 configuration options:
+
+| Option | Description |
+|--------|-------------|
+| `fs.s3a.endpoint` | The endpoint of the S3 service |
+| `fs.s3a.endpoint.region` | The AWS region for the S3 service. If not 
specified, the region will be auto-detected. |
+| `fs.s3a.path.style.access` | Whether to use path style access for the S3 
service (true/false, defaults to virtual hosted style) |
+| `fs.s3a.requester.pays.enabled` | Whether to enable requester pays for S3 
requests (true/false) |
+
+All configuration options support bucket-specific overrides using the pattern 
`fs.s3a.bucket.{bucket-name}.{option}`.
+
+#### Examples
+
+The following examples demonstrate how to configure S3 access with the 
`native_datafusion` Parquet scan implementation using different authentication 
methods.
+
+**Example 1: Simple Credentials**
+
+This example shows how to access a private S3 bucket using an access key and 
secret key. The `fs.s3a.aws.credentials.provider` configuration can be omitted 
since `org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider` is included in 
Hadoop S3A's default credential provider chain.
+
+```shell
+$SPARK_HOME/bin/spark-shell \
+...
+--conf spark.comet.scan.impl=native_datafusion \
+--conf spark.hadoop.fs.s3a.access.key=my-access-key \
+--conf spark.hadoop.fs.s3a.secret.key=my-secret-key
+...
+```
+
+**Example 2: Assume Role with Web Identity Token**
+
+This example demonstrates using an assumed role credential to access a private 
S3 bucket, where the base credential for assuming the role is provided by a web 
identity token credentials provider.
+
+```shell
+$SPARK_HOME/bin/spark-shell \
+...
+--conf spark.comet.scan.impl=native_datafusion \
+--conf 
spark.hadoop.fs.s3a.aws.credentials.provider=org.apache.hadoop.fs.s3a.auth.AssumedRoleCredentialProvider
 \
+--conf 
spark.hadoop.fs.s3a.assumed.role.arn=arn:aws:iam::123456789012:role/my-role \
+--conf spark.hadoop.fs.s3a.assumed.role.session.name=my-session \
+--conf 
spark.hadoop.fs.s3a.assumed.role.credentials.provider=com.amazonaws.auth.WebIdentityTokenCredentialsProvider
+...
+```
+
+#### Limitations
+
+The S3 support of `native_datafusion` has the following limitations:
+
+1. **Partial Hadoop S3A configuration support**: Not all Hadoop S3A 
configurations are currently supported. Only the configurations listed in the 
tables above are translated and applied to the underlying `object_store` crate.
+
+2. **Custom credential providers**: Custom implementations of AWS credential 
providers are not supported. The implementation only supports the standard 
credential providers listed in the table above. We are planning to add support 
for custom credential providers through a JNI-based adapter that will allow 
calling Java credential providers from native code. See [issue 
#1829](https://github.com/apache/datafusion-comet/issues/1829) for more details.
+
+
+
+[#1545]: https://github.com/apache/datafusion-comet/issues/1545
+[#1758]: https://github.com/apache/datafusion-comet/issues/1758
diff --git a/_sources/user-guide/latest/compatibility.md.txt 
b/_sources/user-guide/latest/compatibility.md.txt
index ac2be802d..908693ff5 100644
--- a/_sources/user-guide/latest/compatibility.md.txt
+++ b/_sources/user-guide/latest/compatibility.md.txt
@@ -25,59 +25,11 @@ This guide offers information about areas of functionality 
where there are known
 
 ## Parquet
 
-### Data Type Support
+Comet has the following limitations when reading Parquet files:
 
-Comet does not support reading decimals encoded in binary format.
-
-### Parquet Scans
-
-Comet currently has three distinct implementations of the Parquet scan 
operator. The configuration property
-`spark.comet.scan.impl` is used to select an implementation. The default 
setting is `spark.comet.scan.impl=auto`, and
-Comet will choose the most appropriate implementation based on the Parquet 
schema and other Comet configuration
-settings. Most users should not need to change this setting. However, it is 
possible to force Comet to try and use
-a particular implementation for all scan operations by setting this 
configuration property to one of the following
-implementations.
-
-| Implementation          | Description                                        
                                                                                
                                                  |
-| ----------------------- | 
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 |
-| `native_comet`          | This implementation provides strong compatibility 
with Spark but does not support complex types. This is the original scan 
implementation in Comet and may eventually be removed.    |
-| `native_iceberg_compat` | This implementation delegates to DataFusion's 
`DataSourceExec` but uses a hybrid approach of JVM and native code. This scan 
is designed to be integrated with Iceberg in the future. |
-| `native_datafusion`     | This experimental implementation delegates to 
DataFusion's `DataSourceExec` for full native execution. There are known 
compatibility issues when using this scan.                    |
-
-The `native_datafusion` and `native_iceberg_compat` scans provide the 
following benefits over the `native_comet`
-implementation:
-
-- Leverages the DataFusion community's ongoing improvements to `DataSourceExec`
-- Provides support for reading complex types (structs, arrays, and maps)
-- Removes the use of reusable mutable-buffers in Comet, which is complex to 
maintain
-- Improves performance
-
-The `native_datafusion` and `native_iceberg_compat` scans share the following 
limitations:
-
-- When reading Parquet files written by systems other than Spark that contain 
columns with the logical types `UINT_8`
-  or `UINT_16`, Comet will produce different results than Spark because Spark 
does not preserve or understand these
-  logical types. Arrow-based readers, such as DataFusion and Comet do respect 
these types and read the data as unsigned
-  rather than signed. By default, Comet will fall back to `native_comet` when 
scanning Parquet files containing `byte` or `short`
-  types (regardless of the logical type). This behavior can be disabled by 
setting
-  `spark.comet.scan.allowIncompatible=true`.
+- Comet does not support reading decimals encoded in binary format.
 - No support for default values that are nested types (e.g., maps, arrays, 
structs). Literal default values are supported.
 
-The `native_datafusion` scan has some additional limitations:
-
-- Bucketed scans are not supported
-- No support for row indexes
-- `PARQUET_FIELD_ID_READ_ENABLED` is not respected [#1758]
-- There are failures in the Spark SQL test suite [#1545]
-- Setting Spark configs `ignoreMissingFiles` or `ignoreCorruptFiles` to `true` 
is not compatible with Spark
-
-[#1545]: https://github.com/apache/datafusion-comet/issues/1545
-[#1758]: https://github.com/apache/datafusion-comet/issues/1758
-
-### S3 Support with `native_iceberg_compat`
-
-- When using the default AWS S3 endpoint (no custom endpoint configured), a 
valid region is required. Comet
-  will attempt to resolve the region if it is not provided.
-
 ## ANSI Mode
 
 Comet will fall back to Spark for the following expressions when ANSI mode is 
enabled, unless
@@ -101,18 +53,14 @@ Sorting on floating-point data types (or complex types 
containing floating-point
 Spark if the data contains both zero and negative zero. This is likely an edge 
case that is not of concern for many users
 and sorting on floating-point data can be enabled by setting 
`spark.comet.expression.SortOrder.allowIncompatible=true`.
 
-There is a known bug with using count(distinct) within aggregate queries, 
where each NaN value will be counted
-separately [#1824](https://github.com/apache/datafusion-comet/issues/1824).
-
 ## Incompatible Expressions
 
-Some Comet native expressions are not 100% compatible with Spark and are 
disabled by default. These expressions
-will fall back to Spark but can be enabled by setting 
`spark.comet.expression.allowIncompatible=true`.
-
-## Array Expressions
+Expressions that are not 100% Spark-compatible will fall back to Spark by 
default and can be enabled by setting
+`spark.comet.expression.EXPRNAME.allowIncompatible=true`, where `EXPRNAME` is 
the Spark expression class name. See 
+the [Comet Supported Expressions Guide](expressions.md) for more information 
on this configuration setting.  
 
-Comet has experimental support for a number of array expressions. These are 
experimental and currently marked
-as incompatible and can be enabled by setting 
`spark.comet.expression.allowIncompatible=true`.
+It is also possible to specify `spark.comet.expression.allowIncompatible=true` 
to enable all
+incompatible expressions.
 
 ## Regular Expressions
 
@@ -127,7 +75,7 @@ Cast operations in Comet fall into three levels of support:
 - **Compatible**: The results match Apache Spark
 - **Incompatible**: The results may match Apache Spark for some inputs, but 
there are known issues where some inputs
   will result in incorrect results or exceptions. The query stage will fall 
back to Spark by default. Setting
-  `spark.comet.expression.allowIncompatible=true` will allow all incompatible 
casts to run natively in Comet, but this is not
+  `spark.comet.expression.Cast.allowIncompatible=true` will allow all 
incompatible casts to run natively in Comet, but this is not
   recommended for production use.
 - **Unsupported**: Comet does not provide a native version of this cast 
expression and the query stage will fall back to
   Spark.
diff --git a/_sources/user-guide/latest/datasources.md.txt 
b/_sources/user-guide/latest/datasources.md.txt
index 98bd61f71..14d0ecc15 100644
--- a/_sources/user-guide/latest/datasources.md.txt
+++ b/_sources/user-guide/latest/datasources.md.txt
@@ -163,23 +163,11 @@ Or use `spark-shell` with HDFS support as described 
[above](#using-experimental-
 
 ## S3
 
-DataFusion Comet has [multiple Parquet scan 
implementations](./compatibility.md#parquet-scans) that use different 
approaches to read data from S3.
-
-### `native_comet`
-
-The default `native_comet` Parquet scan implementation reads data from S3 
using the [Hadoop-AWS 
module](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html),
 which is identical to the approach commonly used with vanilla Spark. AWS 
credential configuration and other Hadoop S3A configurations works the same way 
as in vanilla Spark.
-
-### `native_datafusion` and `native_iceberg_compat`
-
-The `native_datafusion` and `native_iceberg_compat` Parquet scan 
implementations completely offload data loading to native code. They use the 
[`object_store` crate](https://crates.io/crates/object_store) to read data from 
S3 and support configuring S3 access using standard [Hadoop S3A 
configurations](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#General_S3A_Client_configuration)
 by translating them to the `object_store` crate's format.
-
-This implementation maintains compatibility with existing Hadoop S3A 
configurations, so existing code will continue to work as long as the 
configurations are supported and can be translated without loss of 
functionality.
-
 #### Root CA Certificates
 
-One major difference between `native_comet` and the other scan implementations 
is the mechanism for discovering Root
-CA Certificates. The `native_comet` scan uses the JVM to read CA Certificates 
from the Java Trust Store, but the native
-scan implementations `native_datafusion` and `native_iceberg_compat` use 
system Root CA Certificates (typically stored 
+One major difference between Spark and Comet is the mechanism for discovering 
Root
+CA Certificates. Spark uses the JVM to read CA Certificates from the Java 
Trust Store, but native Comet
+scans use system Root CA Certificates (typically stored 
 in `/etc/ssl/certs` on Linux). These scans will not be able to interact with 
S3 if the Root CA Certificates are not
 installed.
 
@@ -200,57 +188,3 @@ AWS credential providers can be configured using the 
`fs.s3a.aws.credentials.pro
 | 
`com.amazonaws.auth.WebIdentityTokenCredentialsProvider`<br/>`software.amazon.awssdk.auth.credentials.WebIdentityTokenFileCredentialsProvider`
 | Authenticate using web identity token file | None |
 
 Multiple credential providers can be specified in a comma-separated list using 
the `fs.s3a.aws.credentials.provider` configuration, just as Hadoop AWS 
supports. If `fs.s3a.aws.credentials.provider` is not configured, Hadoop S3A's 
default credential provider chain will be used. All configuration options also 
support bucket-specific overrides using the pattern 
`fs.s3a.bucket.{bucket-name}.{option}`.
-
-#### Additional S3 Configuration Options
-
-Beyond credential providers, the `native_datafusion` implementation supports 
additional S3 configuration options:
-
-| Option | Description |
-|--------|-------------|
-| `fs.s3a.endpoint` | The endpoint of the S3 service |
-| `fs.s3a.endpoint.region` | The AWS region for the S3 service. If not 
specified, the region will be auto-detected. |
-| `fs.s3a.path.style.access` | Whether to use path style access for the S3 
service (true/false, defaults to virtual hosted style) |
-| `fs.s3a.requester.pays.enabled` | Whether to enable requester pays for S3 
requests (true/false) |
-
-All configuration options support bucket-specific overrides using the pattern 
`fs.s3a.bucket.{bucket-name}.{option}`.
-
-#### Examples
-
-The following examples demonstrate how to configure S3 access with the 
`native_datafusion` Parquet scan implementation using different authentication 
methods.
-
-**Example 1: Simple Credentials**
-
-This example shows how to access a private S3 bucket using an access key and 
secret key. The `fs.s3a.aws.credentials.provider` configuration can be omitted 
since `org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider` is included in 
Hadoop S3A's default credential provider chain.
-
-```shell
-$SPARK_HOME/bin/spark-shell \
-...
---conf spark.comet.scan.impl=native_datafusion \
---conf spark.hadoop.fs.s3a.access.key=my-access-key \
---conf spark.hadoop.fs.s3a.secret.key=my-secret-key
-...
-```
-
-**Example 2: Assume Role with Web Identity Token**
-
-This example demonstrates using an assumed role credential to access a private 
S3 bucket, where the base credential for assuming the role is provided by a web 
identity token credentials provider.
-
-```shell
-$SPARK_HOME/bin/spark-shell \
-...
---conf spark.comet.scan.impl=native_datafusion \
---conf 
spark.hadoop.fs.s3a.aws.credentials.provider=org.apache.hadoop.fs.s3a.auth.AssumedRoleCredentialProvider
 \
---conf 
spark.hadoop.fs.s3a.assumed.role.arn=arn:aws:iam::123456789012:role/my-role \
---conf spark.hadoop.fs.s3a.assumed.role.session.name=my-session \
---conf 
spark.hadoop.fs.s3a.assumed.role.credentials.provider=com.amazonaws.auth.WebIdentityTokenCredentialsProvider
-...
-```
-
-#### Limitations
-
-The S3 support of `native_datafusion` has the following limitations:
-
-1. **Partial Hadoop S3A configuration support**: Not all Hadoop S3A 
configurations are currently supported. Only the configurations listed in the 
tables above are translated and applied to the underlying `object_store` crate.
-
-2. **Custom credential providers**: Custom implementations of AWS credential 
providers are not supported. The implementation only supports the standard 
credential providers listed in the table above. We are planning to add support 
for custom credential providers through a JNI-based adapter that will allow 
calling Java credential providers from native code. See [issue 
#1829](https://github.com/apache/datafusion-comet/issues/1829) for more details.
-
diff --git a/contributor-guide/adding_a_new_expression.html 
b/contributor-guide/adding_a_new_expression.html
index d749ae49b..28c2a9c48 100644
--- a/contributor-guide/adding_a_new_expression.html
+++ b/contributor-guide/adding_a_new_expression.html
@@ -357,6 +357,7 @@ under the License.
 <li class="toctree-l2"><a class="reference internal" 
href="plugin_overview.html">Comet Plugin Architecture</a></li>
 <li class="toctree-l2"><a class="reference internal" 
href="plugin_overview.html#plugin-components">Plugin Components</a></li>
 <li class="toctree-l2"><a class="reference internal" href="ffi.html">Arrow 
FFI</a></li>
+<li class="toctree-l2"><a class="reference internal" 
href="parquet_scans.html">Parquet Scans</a></li>
 <li class="toctree-l2"><a class="reference internal" 
href="development.html">Development Guide</a></li>
 <li class="toctree-l2"><a class="reference internal" 
href="debugging.html">Debugging Guide</a></li>
 <li class="toctree-l2"><a class="reference internal" 
href="benchmarking.html">Benchmarking Guide</a></li>
diff --git a/contributor-guide/benchmarking.html 
b/contributor-guide/benchmarking.html
index 3d65c3ff5..6723a56bf 100644
--- a/contributor-guide/benchmarking.html
+++ b/contributor-guide/benchmarking.html
@@ -357,6 +357,7 @@ under the License.
 <li class="toctree-l2"><a class="reference internal" 
href="plugin_overview.html">Comet Plugin Architecture</a></li>
 <li class="toctree-l2"><a class="reference internal" 
href="plugin_overview.html#plugin-components">Plugin Components</a></li>
 <li class="toctree-l2"><a class="reference internal" href="ffi.html">Arrow 
FFI</a></li>
+<li class="toctree-l2"><a class="reference internal" 
href="parquet_scans.html">Parquet Scans</a></li>
 <li class="toctree-l2"><a class="reference internal" 
href="development.html">Development Guide</a></li>
 <li class="toctree-l2"><a class="reference internal" 
href="debugging.html">Debugging Guide</a></li>
 <li class="toctree-l2 current"><a class="current reference internal" 
href="#">Benchmarking Guide</a></li>
diff --git a/contributor-guide/contributing.html 
b/contributor-guide/contributing.html
index 4c91dfb07..31fba8be2 100644
--- a/contributor-guide/contributing.html
+++ b/contributor-guide/contributing.html
@@ -357,6 +357,7 @@ under the License.
 <li class="toctree-l2"><a class="reference internal" 
href="plugin_overview.html">Comet Plugin Architecture</a></li>
 <li class="toctree-l2"><a class="reference internal" 
href="plugin_overview.html#plugin-components">Plugin Components</a></li>
 <li class="toctree-l2"><a class="reference internal" href="ffi.html">Arrow 
FFI</a></li>
+<li class="toctree-l2"><a class="reference internal" 
href="parquet_scans.html">Parquet Scans</a></li>
 <li class="toctree-l2"><a class="reference internal" 
href="development.html">Development Guide</a></li>
 <li class="toctree-l2"><a class="reference internal" 
href="debugging.html">Debugging Guide</a></li>
 <li class="toctree-l2"><a class="reference internal" 
href="benchmarking.html">Benchmarking Guide</a></li>
diff --git a/contributor-guide/debugging.html b/contributor-guide/debugging.html
index 197b3a3e5..445e001bd 100644
--- a/contributor-guide/debugging.html
+++ b/contributor-guide/debugging.html
@@ -357,6 +357,7 @@ under the License.
 <li class="toctree-l2"><a class="reference internal" 
href="plugin_overview.html">Comet Plugin Architecture</a></li>
 <li class="toctree-l2"><a class="reference internal" 
href="plugin_overview.html#plugin-components">Plugin Components</a></li>
 <li class="toctree-l2"><a class="reference internal" href="ffi.html">Arrow 
FFI</a></li>
+<li class="toctree-l2"><a class="reference internal" 
href="parquet_scans.html">Parquet Scans</a></li>
 <li class="toctree-l2"><a class="reference internal" 
href="development.html">Development Guide</a></li>
 <li class="toctree-l2 current"><a class="current reference internal" 
href="#">Debugging Guide</a></li>
 <li class="toctree-l2"><a class="reference internal" 
href="benchmarking.html">Benchmarking Guide</a></li>
diff --git a/contributor-guide/development.html 
b/contributor-guide/development.html
index c60c4dd6b..a034984d1 100644
--- a/contributor-guide/development.html
+++ b/contributor-guide/development.html
@@ -66,7 +66,7 @@ under the License.
     <link rel="index" title="Index" href="../genindex.html" />
     <link rel="search" title="Search" href="../search.html" />
     <link rel="next" title="Comet Debugging Guide" href="debugging.html" />
-    <link rel="prev" title="Arrow FFI Usage in Comet" href="ffi.html" />
+    <link rel="prev" title="Comet Parquet Scan Implementations" 
href="parquet_scans.html" />
   <meta name="viewport" content="width=device-width, initial-scale=1"/>
   <meta name="docsearch:language" content="en"/>
   <meta name="docsearch:version" content="" />
@@ -357,6 +357,7 @@ under the License.
 <li class="toctree-l2"><a class="reference internal" 
href="plugin_overview.html">Comet Plugin Architecture</a></li>
 <li class="toctree-l2"><a class="reference internal" 
href="plugin_overview.html#plugin-components">Plugin Components</a></li>
 <li class="toctree-l2"><a class="reference internal" href="ffi.html">Arrow 
FFI</a></li>
+<li class="toctree-l2"><a class="reference internal" 
href="parquet_scans.html">Parquet Scans</a></li>
 <li class="toctree-l2 current"><a class="current reference internal" 
href="#">Development Guide</a></li>
 <li class="toctree-l2"><a class="reference internal" 
href="debugging.html">Debugging Guide</a></li>
 <li class="toctree-l2"><a class="reference internal" 
href="benchmarking.html">Benchmarking Guide</a></li>
@@ -601,12 +602,12 @@ cargo<span class="w"> </span>clippy<span class="w"> 
</span>--color<span class="o
                   
 <div class="prev-next-area">
     <a class="left-prev"
-       href="ffi.html"
+       href="parquet_scans.html"
        title="previous page">
       <i class="fa-solid fa-angle-left"></i>
       <div class="prev-next-info">
         <p class="prev-next-subtitle">previous</p>
-        <p class="prev-next-title">Arrow FFI Usage in Comet</p>
+        <p class="prev-next-title">Comet Parquet Scan Implementations</p>
       </div>
     </a>
     <a class="right-next"
diff --git a/contributor-guide/ffi.html b/contributor-guide/ffi.html
index 93dfc9e12..c8585787a 100644
--- a/contributor-guide/ffi.html
+++ b/contributor-guide/ffi.html
@@ -65,7 +65,7 @@ under the License.
     <script async="true" defer="true" 
src="https://buttons.github.io/buttons.js";></script>
     <link rel="index" title="Index" href="../genindex.html" />
     <link rel="search" title="Search" href="../search.html" />
-    <link rel="next" title="Comet Development Guide" href="development.html" />
+    <link rel="next" title="Comet Parquet Scan Implementations" 
href="parquet_scans.html" />
     <link rel="prev" title="Comet Plugin Architecture" 
href="plugin_overview.html" />
   <meta name="viewport" content="width=device-width, initial-scale=1"/>
   <meta name="docsearch:language" content="en"/>
@@ -357,6 +357,7 @@ under the License.
 <li class="toctree-l2"><a class="reference internal" 
href="plugin_overview.html">Comet Plugin Architecture</a></li>
 <li class="toctree-l2"><a class="reference internal" 
href="plugin_overview.html#plugin-components">Plugin Components</a></li>
 <li class="toctree-l2 current"><a class="current reference internal" 
href="#">Arrow FFI</a></li>
+<li class="toctree-l2"><a class="reference internal" 
href="parquet_scans.html">Parquet Scans</a></li>
 <li class="toctree-l2"><a class="reference internal" 
href="development.html">Development Guide</a></li>
 <li class="toctree-l2"><a class="reference internal" 
href="debugging.html">Debugging Guide</a></li>
 <li class="toctree-l2"><a class="reference internal" 
href="benchmarking.html">Benchmarking Guide</a></li>
@@ -832,11 +833,11 @@ t4      Batch handle released      ArrowBuf freed        
Data freed
       </div>
     </a>
     <a class="right-next"
-       href="development.html"
+       href="parquet_scans.html"
        title="next page">
       <div class="prev-next-info">
         <p class="prev-next-subtitle">next</p>
-        <p class="prev-next-title">Comet Development Guide</p>
+        <p class="prev-next-title">Comet Parquet Scan Implementations</p>
       </div>
       <i class="fa-solid fa-angle-right"></i>
     </a>
diff --git a/contributor-guide/index.html b/contributor-guide/index.html
index 40f181483..52f0039f6 100644
--- a/contributor-guide/index.html
+++ b/contributor-guide/index.html
@@ -357,6 +357,7 @@ under the License.
 <li class="toctree-l2"><a class="reference internal" 
href="plugin_overview.html">Comet Plugin Architecture</a></li>
 <li class="toctree-l2"><a class="reference internal" 
href="plugin_overview.html#plugin-components">Plugin Components</a></li>
 <li class="toctree-l2"><a class="reference internal" href="ffi.html">Arrow 
FFI</a></li>
+<li class="toctree-l2"><a class="reference internal" 
href="parquet_scans.html">Parquet Scans</a></li>
 <li class="toctree-l2"><a class="reference internal" 
href="development.html">Development Guide</a></li>
 <li class="toctree-l2"><a class="reference internal" 
href="debugging.html">Debugging Guide</a></li>
 <li class="toctree-l2"><a class="reference internal" 
href="benchmarking.html">Benchmarking Guide</a></li>
@@ -478,6 +479,10 @@ under the License.
 <li class="toctree-l2"><a class="reference internal" 
href="ffi.html#further-reading">Further Reading</a></li>
 </ul>
 </li>
+<li class="toctree-l1"><a class="reference internal" 
href="parquet_scans.html">Parquet Scans</a><ul>
+<li class="toctree-l2"><a class="reference internal" 
href="parquet_scans.html#s3-support">S3 Support</a></li>
+</ul>
+</li>
 <li class="toctree-l1"><a class="reference internal" 
href="development.html">Development Guide</a><ul>
 <li class="toctree-l2"><a class="reference internal" 
href="development.html#project-layout">Project Layout</a></li>
 <li class="toctree-l2"><a class="reference internal" 
href="development.html#development-setup">Development Setup</a></li>
diff --git a/contributor-guide/tracing.html 
b/contributor-guide/parquet_scans.html
similarity index 55%
copy from contributor-guide/tracing.html
copy to contributor-guide/parquet_scans.html
index 74cffacaa..e0589b79b 100644
--- a/contributor-guide/tracing.html
+++ b/contributor-guide/parquet_scans.html
@@ -27,7 +27,7 @@ under the License.
     <meta charset="utf-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" 
/><meta name="viewport" content="width=device-width, initial-scale=1" />
 
-    <title>Tracing &#8212; Apache DataFusion Comet  documentation</title>
+    <title>Comet Parquet Scan Implementations &#8212; Apache DataFusion Comet  
documentation</title>
   
   
   
@@ -61,12 +61,12 @@ under the License.
     <script src="../_static/documentation_options.js?v=5929fcd5"></script>
     <script src="../_static/doctools.js?v=9a2dae69"></script>
     <script src="../_static/sphinx_highlight.js?v=dc90522c"></script>
-    <script>DOCUMENTATION_OPTIONS.pagename = 
'contributor-guide/tracing';</script>
+    <script>DOCUMENTATION_OPTIONS.pagename = 
'contributor-guide/parquet_scans';</script>
     <script async="true" defer="true" 
src="https://buttons.github.io/buttons.js";></script>
     <link rel="index" title="Index" href="../genindex.html" />
     <link rel="search" title="Search" href="../search.html" />
-    <link rel="next" title="Profiling Native Code" 
href="profiling_native_code.html" />
-    <link rel="prev" title="Adding a New Expression" 
href="adding_a_new_expression.html" />
+    <link rel="next" title="Comet Development Guide" href="development.html" />
+    <link rel="prev" title="Arrow FFI Usage in Comet" href="ffi.html" />
   <meta name="viewport" content="width=device-width, initial-scale=1"/>
   <meta name="docsearch:language" content="en"/>
   <meta name="docsearch:version" content="" />
@@ -357,11 +357,12 @@ under the License.
 <li class="toctree-l2"><a class="reference internal" 
href="plugin_overview.html">Comet Plugin Architecture</a></li>
 <li class="toctree-l2"><a class="reference internal" 
href="plugin_overview.html#plugin-components">Plugin Components</a></li>
 <li class="toctree-l2"><a class="reference internal" href="ffi.html">Arrow 
FFI</a></li>
+<li class="toctree-l2 current"><a class="current reference internal" 
href="#">Parquet Scans</a></li>
 <li class="toctree-l2"><a class="reference internal" 
href="development.html">Development Guide</a></li>
 <li class="toctree-l2"><a class="reference internal" 
href="debugging.html">Debugging Guide</a></li>
 <li class="toctree-l2"><a class="reference internal" 
href="benchmarking.html">Benchmarking Guide</a></li>
 <li class="toctree-l2"><a class="reference internal" 
href="adding_a_new_expression.html">Adding a New Expression</a></li>
-<li class="toctree-l2 current"><a class="current reference internal" 
href="#">Tracing</a></li>
+<li class="toctree-l2"><a class="reference internal" 
href="tracing.html">Tracing</a></li>
 <li class="toctree-l2"><a class="reference internal" 
href="profiling_native_code.html">Profiling Native Code</a></li>
 <li class="toctree-l2"><a class="reference internal" 
href="spark-sql-tests.html">Spark SQL Tests</a></li>
 <li class="toctree-l2"><a class="reference internal" 
href="roadmap.html">Roadmap</a></li>
@@ -414,7 +415,7 @@ under the License.
     
     <li class="breadcrumb-item"><a href="index.html" class="nav-link">Comet 
Contributor Guide</a></li>
     
-    <li class="breadcrumb-item active" aria-current="page"><span 
class="ellipsis">Tracing</span></li>
+    <li class="breadcrumb-item active" aria-current="page"><span 
class="ellipsis">Comet Parquet Scan Implementations</span></li>
   </ul>
 </nav>
 </div>
@@ -449,56 +450,138 @@ KIND, either express or implied.  See the License for the
 specific language governing permissions and limitations
 under the License.
 -->
-<section id="tracing">
-<h1>Tracing<a class="headerlink" href="#tracing" title="Link to this 
heading">#</a></h1>
-<p>Tracing can be enabled by setting <code class="docutils literal 
notranslate"><span 
class="pre">spark.comet.tracing.enabled=true</span></code>.</p>
-<p>With this feature enabled, each Spark executor will write a JSON event log 
file in
-Chrome’s <a class="reference external" 
href="https://docs.google.com/document/d/1CvAClvFfyA5R-PhYUmn5OOQtYMH4h6I0nSsKchNAySU/preview?tab=t.0#heading=h.yr4qxyxotyw";>Trace
 Event Format</a>. The file will be written to the executor’s current working
-directory with the filename <code class="docutils literal notranslate"><span 
class="pre">comet-event-trace.json</span></code>.</p>
-<p>Additionally, enabling the <code class="docutils literal notranslate"><span 
class="pre">jemalloc</span></code> feature will enable tracing of native memory 
allocations.</p>
-<div class="highlight-shell notranslate"><div 
class="highlight"><pre><span></span>make<span class="w"> </span>release<span 
class="w"> </span><span class="nv">COMET_FEATURES</span><span 
class="o">=</span><span class="s2">&quot;jemalloc&quot;</span>
-</pre></div>
-</div>
-<p>Example output:</p>
-<div class="highlight-json notranslate"><div 
class="highlight"><pre><span></span><span class="p">{</span><span class="w"> 
</span><span class="nt">&quot;name&quot;</span><span class="p">:</span><span 
class="w"> </span><span class="s2">&quot;decodeShuffleBlock&quot;</span><span 
class="p">,</span><span class="w"> </span><span 
class="nt">&quot;cat&quot;</span><span class="p">:</span><span class="w"> 
</span><span class="s2">&quot;PERF&quot;</span><span class="p">,</span><span 
class="w"> </spa [...]
-<span class="p">{</span><span class="w"> </span><span 
class="nt">&quot;name&quot;</span><span class="p">:</span><span class="w"> 
</span><span class="s2">&quot;decodeShuffleBlock&quot;</span><span 
class="p">,</span><span class="w"> </span><span 
class="nt">&quot;cat&quot;</span><span class="p">:</span><span class="w"> 
</span><span class="s2">&quot;PERF&quot;</span><span class="p">,</span><span 
class="w"> </span><span class="nt">&quot;ph&quot;</span><span 
class="p">:</span><span class="w">  [...]
-<span class="p">{</span><span class="w"> </span><span 
class="nt">&quot;name&quot;</span><span class="p">:</span><span class="w"> 
</span><span class="s2">&quot;decodeShuffleBlock&quot;</span><span 
class="p">,</span><span class="w"> </span><span 
class="nt">&quot;cat&quot;</span><span class="p">:</span><span class="w"> 
</span><span class="s2">&quot;PERF&quot;</span><span class="p">,</span><span 
class="w"> </span><span class="nt">&quot;ph&quot;</span><span 
class="p">:</span><span class="w">  [...]
-<span class="p">{</span><span class="w"> </span><span 
class="nt">&quot;name&quot;</span><span class="p">:</span><span class="w"> 
</span><span class="s2">&quot;decodeShuffleBlock&quot;</span><span 
class="p">,</span><span class="w"> </span><span 
class="nt">&quot;cat&quot;</span><span class="p">:</span><span class="w"> 
</span><span class="s2">&quot;PERF&quot;</span><span class="p">,</span><span 
class="w"> </span><span class="nt">&quot;ph&quot;</span><span 
class="p">:</span><span class="w">  [...]
-<span class="p">{</span><span class="w"> </span><span 
class="nt">&quot;name&quot;</span><span class="p">:</span><span class="w"> 
</span><span class="s2">&quot;execute_plan&quot;</span><span 
class="p">,</span><span class="w"> </span><span 
class="nt">&quot;cat&quot;</span><span class="p">:</span><span class="w"> 
</span><span class="s2">&quot;PERF&quot;</span><span class="p">,</span><span 
class="w"> </span><span class="nt">&quot;ph&quot;</span><span 
class="p">:</span><span class="w"> </span [...]
-<span class="p">{</span><span class="w"> </span><span 
class="nt">&quot;name&quot;</span><span class="p">:</span><span class="w"> 
</span><span class="s2">&quot;CometExecIterator_getNextBatch&quot;</span><span 
class="p">,</span><span class="w"> </span><span 
class="nt">&quot;cat&quot;</span><span class="p">:</span><span class="w"> 
</span><span class="s2">&quot;PERF&quot;</span><span class="p">,</span><span 
class="w"> </span><span class="nt">&quot;ph&quot;</span><span 
class="p">:</span><span [...]
-<span class="p">{</span><span class="w"> </span><span 
class="nt">&quot;name&quot;</span><span class="p">:</span><span class="w"> 
</span><span class="s2">&quot;CometExecIterator_getNextBatch&quot;</span><span 
class="p">,</span><span class="w"> </span><span 
class="nt">&quot;cat&quot;</span><span class="p">:</span><span class="w"> 
</span><span class="s2">&quot;PERF&quot;</span><span class="p">,</span><span 
class="w"> </span><span class="nt">&quot;ph&quot;</span><span 
class="p">:</span><span [...]
-</pre></div>
-</div>
-<p>Traces can be viewed with <a class="reference external" 
href="https://github.com/catapult-project/catapult/blob/main/tracing/README.md";>Trace
 Viewer</a>.</p>
-<p>Example trace visualization:</p>
-<p><img alt="tracing" src="../_images/tracing.png" /></p>
-<section id="definition-of-labels">
-<h2>Definition of Labels<a class="headerlink" href="#definition-of-labels" 
title="Link to this heading">#</a></h2>
+<section id="comet-parquet-scan-implementations">
+<h1>Comet Parquet Scan Implementations<a class="headerlink" 
href="#comet-parquet-scan-implementations" title="Link to this 
heading">#</a></h1>
+<p>Comet currently has three distinct implementations of the Parquet scan 
operator. The configuration property
+<code class="docutils literal notranslate"><span 
class="pre">spark.comet.scan.impl</span></code> is used to select an 
implementation. The default setting is <code class="docutils literal 
notranslate"><span class="pre">spark.comet.scan.impl=auto</span></code>, and
+Comet will choose the most appropriate implementation based on the Parquet 
schema and other Comet configuration
+settings. Most users should not need to change this setting. However, it is 
possible to force Comet to try and use
+a particular implementation for all scan operations by setting this 
configuration property to one of the following
+implementations.</p>
 <div class="pst-scrollable-table-container"><table class="table">
 <thead>
-<tr class="row-odd"><th class="head"><p>Label</p></th>
-<th class="head"><p>Meaning</p></th>
+<tr class="row-odd"><th class="head"><p>Implementation</p></th>
+<th class="head"><p>Description</p></th>
 </tr>
 </thead>
 <tbody>
-<tr class="row-even"><td><p>jvm_heapUsed</p></td>
-<td><p>JVM heap memory usage of live objects for the executor process</p></td>
+<tr class="row-even"><td><p><code class="docutils literal notranslate"><span 
class="pre">native_comet</span></code></p></td>
+<td><p>This implementation provides strong compatibility with Spark but does 
not support complex types. This is the original scan implementation in Comet 
and may eventually be removed.</p></td>
+</tr>
+<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span 
class="pre">native_iceberg_compat</span></code></p></td>
+<td><p>This implementation delegates to DataFusion’s <code class="docutils 
literal notranslate"><span class="pre">DataSourceExec</span></code> but uses a 
hybrid approach of JVM and native code. This scan is designed to be integrated 
with Iceberg in the future.</p></td>
+</tr>
+<tr class="row-even"><td><p><code class="docutils literal notranslate"><span 
class="pre">native_datafusion</span></code></p></td>
+<td><p>This experimental implementation delegates to DataFusion’s <code 
class="docutils literal notranslate"><span 
class="pre">DataSourceExec</span></code> for full native execution. There are 
known compatibility issues when using this scan.</p></td>
+</tr>
+</tbody>
+</table>
+</div>
+<p>The <code class="docutils literal notranslate"><span 
class="pre">native_datafusion</span></code> and <code class="docutils literal 
notranslate"><span class="pre">native_iceberg_compat</span></code> scans 
provide the following benefits over the <code class="docutils literal 
notranslate"><span class="pre">native_comet</span></code>
+implementation:</p>
+<ul class="simple">
+<li><p>Leverages the DataFusion community’s ongoing improvements to <code 
class="docutils literal notranslate"><span 
class="pre">DataSourceExec</span></code></p></li>
+<li><p>Provides support for reading complex types (structs, arrays, and 
maps)</p></li>
+<li><p>Removes the use of reusable mutable-buffers in Comet, which is complex 
to maintain</p></li>
+<li><p>Improves performance</p></li>
+</ul>
+<p>The <code class="docutils literal notranslate"><span 
class="pre">native_datafusion</span></code> and <code class="docutils literal 
notranslate"><span class="pre">native_iceberg_compat</span></code> scans share 
the following limitations:</p>
+<ul class="simple">
+<li><p>When reading Parquet files written by systems other than Spark that 
contain columns with the logical types <code class="docutils literal 
notranslate"><span class="pre">UINT_8</span></code>
+or <code class="docutils literal notranslate"><span 
class="pre">UINT_16</span></code>, Comet will produce different results than 
Spark because Spark does not preserve or understand these
+logical types. Arrow-based readers, such as DataFusion and Comet do respect 
these types and read the data as unsigned
+rather than signed. By default, Comet will fall back to <code class="docutils 
literal notranslate"><span class="pre">native_comet</span></code> when scanning 
Parquet files containing <code class="docutils literal notranslate"><span 
class="pre">byte</span></code> or <code class="docutils literal 
notranslate"><span class="pre">short</span></code>
+types (regardless of the logical type). This behavior can be disabled by 
setting
+<code class="docutils literal notranslate"><span 
class="pre">spark.comet.scan.allowIncompatible=true</span></code>.</p></li>
+<li><p>No support for default values that are nested types (e.g., maps, 
arrays, structs). Literal default values are supported.</p></li>
+</ul>
+<p>The <code class="docutils literal notranslate"><span 
class="pre">native_datafusion</span></code> scan has some additional 
limitations:</p>
+<ul class="simple">
+<li><p>Bucketed scans are not supported</p></li>
+<li><p>No support for row indexes</p></li>
+<li><p><code class="docutils literal notranslate"><span 
class="pre">PARQUET_FIELD_ID_READ_ENABLED</span></code> is not respected <a 
class="reference external" 
href="https://github.com/apache/datafusion-comet/issues/1758";>#1758</a></p></li>
+<li><p>There are failures in the Spark SQL test suite <a class="reference 
external" 
href="https://github.com/apache/datafusion-comet/issues/1545";>#1545</a></p></li>
+<li><p>Setting Spark configs <code class="docutils literal notranslate"><span 
class="pre">ignoreMissingFiles</span></code> or <code class="docutils literal 
notranslate"><span class="pre">ignoreCorruptFiles</span></code> to <code 
class="docutils literal notranslate"><span class="pre">true</span></code> is 
not compatible with Spark</p></li>
+</ul>
+<section id="s3-support">
+<h2>S3 Support<a class="headerlink" href="#s3-support" title="Link to this 
heading">#</a></h2>
+<p>There are some</p>
+<section id="native-comet">
+<h3><code class="docutils literal notranslate"><span 
class="pre">native_comet</span></code><a class="headerlink" 
href="#native-comet" title="Link to this heading">#</a></h3>
+<p>The default <code class="docutils literal notranslate"><span 
class="pre">native_comet</span></code> Parquet scan implementation reads data 
from S3 using the <a class="reference external" 
href="https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html";>Hadoop-AWS
 module</a>, which
+is identical to the approach commonly used with vanilla Spark. AWS credential 
configuration and other Hadoop S3A
+configurations works the same way as in vanilla Spark.</p>
+</section>
+<section id="native-datafusion-and-native-iceberg-compat">
+<h3><code class="docutils literal notranslate"><span 
class="pre">native_datafusion</span></code> and <code class="docutils literal 
notranslate"><span class="pre">native_iceberg_compat</span></code><a 
class="headerlink" href="#native-datafusion-and-native-iceberg-compat" 
title="Link to this heading">#</a></h3>
+<p>The <code class="docutils literal notranslate"><span 
class="pre">native_datafusion</span></code> and <code class="docutils literal 
notranslate"><span class="pre">native_iceberg_compat</span></code> Parquet scan 
implementations completely offload data loading
+to native code. They use the <a class="reference external" 
href="https://crates.io/crates/object_store";><code class="docutils literal 
notranslate"><span class="pre">object_store</span></code> crate</a> to read 
data from S3 and
+support configuring S3 access using standard <a class="reference external" 
href="https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#General_S3A_Client_configuration";>Hadoop
 S3A configurations</a> by translating them to
+the <code class="docutils literal notranslate"><span 
class="pre">object_store</span></code> crate’s format.</p>
+<p>This implementation maintains compatibility with existing Hadoop S3A 
configurations, so existing code will
+continue to work as long as the configurations are supported and can be 
translated without loss of functionality.</p>
+<section id="additional-s3-configuration-options">
+<h4>Additional S3 Configuration Options<a class="headerlink" 
href="#additional-s3-configuration-options" title="Link to this 
heading">#</a></h4>
+<p>Beyond credential providers, the <code class="docutils literal 
notranslate"><span class="pre">native_datafusion</span></code> implementation 
supports additional S3 configuration options:</p>
+<div class="pst-scrollable-table-container"><table class="table">
+<thead>
+<tr class="row-odd"><th class="head"><p>Option</p></th>
+<th class="head"><p>Description</p></th>
 </tr>
-<tr class="row-odd"><td><p>jemalloc_allocated</p></td>
-<td><p>Native memory usage for the executor process</p></td>
+</thead>
+<tbody>
+<tr class="row-even"><td><p><code class="docutils literal notranslate"><span 
class="pre">fs.s3a.endpoint</span></code></p></td>
+<td><p>The endpoint of the S3 service</p></td>
 </tr>
-<tr class="row-even"><td><p>task_memory_comet_NNN</p></td>
-<td><p>Off-heap memory allocated by Comet for query execution</p></td>
+<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span 
class="pre">fs.s3a.endpoint.region</span></code></p></td>
+<td><p>The AWS region for the S3 service. If not specified, the region will be 
auto-detected.</p></td>
 </tr>
-<tr class="row-odd"><td><p>task_memory_spark_NNN</p></td>
-<td><p>On-heap &amp; Off-heap memory allocated by Spark</p></td>
+<tr class="row-even"><td><p><code class="docutils literal notranslate"><span 
class="pre">fs.s3a.path.style.access</span></code></p></td>
+<td><p>Whether to use path style access for the S3 service (true/false, 
defaults to virtual hosted style)</p></td>
 </tr>
-<tr class="row-even"><td><p>comet_shuffle_NNN</p></td>
-<td><p>Off-heap memory allocated by Comet for columnar shuffle</p></td>
+<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span 
class="pre">fs.s3a.requester.pays.enabled</span></code></p></td>
+<td><p>Whether to enable requester pays for S3 requests (true/false)</p></td>
 </tr>
 </tbody>
 </table>
 </div>
+<p>All configuration options support bucket-specific overrides using the 
pattern <code class="docutils literal notranslate"><span 
class="pre">fs.s3a.bucket.{bucket-name}.{option}</span></code>.</p>
+</section>
+<section id="examples">
+<h4>Examples<a class="headerlink" href="#examples" title="Link to this 
heading">#</a></h4>
+<p>The following examples demonstrate how to configure S3 access with the 
<code class="docutils literal notranslate"><span 
class="pre">native_datafusion</span></code> Parquet scan implementation using 
different authentication methods.</p>
+<p><strong>Example 1: Simple Credentials</strong></p>
+<p>This example shows how to access a private S3 bucket using an access key 
and secret key. The <code class="docutils literal notranslate"><span 
class="pre">fs.s3a.aws.credentials.provider</span></code> configuration can be 
omitted since <code class="docutils literal notranslate"><span 
class="pre">org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider</span></code> 
is included in Hadoop S3A’s default credential provider chain.</p>
+<div class="highlight-shell notranslate"><div 
class="highlight"><pre><span></span><span 
class="nv">$SPARK_HOME</span>/bin/spark-shell<span class="w"> </span><span 
class="se">\</span>
+...
+--conf<span class="w"> </span>spark.comet.scan.impl<span 
class="o">=</span>native_datafusion<span class="w"> </span><span 
class="se">\</span>
+--conf<span class="w"> </span>spark.hadoop.fs.s3a.access.key<span 
class="o">=</span>my-access-key<span class="w"> </span><span class="se">\</span>
+--conf<span class="w"> </span>spark.hadoop.fs.s3a.secret.key<span 
class="o">=</span>my-secret-key
+...
+</pre></div>
+</div>
+<p><strong>Example 2: Assume Role with Web Identity Token</strong></p>
+<p>This example demonstrates using an assumed role credential to access a 
private S3 bucket, where the base credential for assuming the role is provided 
by a web identity token credentials provider.</p>
+<div class="highlight-shell notranslate"><div 
class="highlight"><pre><span></span><span 
class="nv">$SPARK_HOME</span>/bin/spark-shell<span class="w"> </span><span 
class="se">\</span>
+...
+--conf<span class="w"> </span>spark.comet.scan.impl<span 
class="o">=</span>native_datafusion<span class="w"> </span><span 
class="se">\</span>
+--conf<span class="w"> 
</span>spark.hadoop.fs.s3a.aws.credentials.provider<span 
class="o">=</span>org.apache.hadoop.fs.s3a.auth.AssumedRoleCredentialProvider<span
 class="w"> </span><span class="se">\</span>
+--conf<span class="w"> </span>spark.hadoop.fs.s3a.assumed.role.arn<span 
class="o">=</span>arn:aws:iam::123456789012:role/my-role<span class="w"> 
</span><span class="se">\</span>
+--conf<span class="w"> 
</span>spark.hadoop.fs.s3a.assumed.role.session.name<span 
class="o">=</span>my-session<span class="w"> </span><span class="se">\</span>
+--conf<span class="w"> 
</span>spark.hadoop.fs.s3a.assumed.role.credentials.provider<span 
class="o">=</span>com.amazonaws.auth.WebIdentityTokenCredentialsProvider
+...
+</pre></div>
+</div>
+</section>
+<section id="limitations">
+<h4>Limitations<a class="headerlink" href="#limitations" title="Link to this 
heading">#</a></h4>
+<p>The S3 support of <code class="docutils literal notranslate"><span 
class="pre">native_datafusion</span></code> has the following limitations:</p>
+<ol class="arabic simple">
+<li><p><strong>Partial Hadoop S3A configuration support</strong>: Not all 
Hadoop S3A configurations are currently supported. Only the configurations 
listed in the tables above are translated and applied to the underlying <code 
class="docutils literal notranslate"><span 
class="pre">object_store</span></code> crate.</p></li>
+<li><p><strong>Custom credential providers</strong>: Custom implementations of 
AWS credential providers are not supported. The implementation only supports 
the standard credential providers listed in the table above. We are planning to 
add support for custom credential providers through a JNI-based adapter that 
will allow calling Java credential providers from native code. See <a 
class="reference external" 
href="https://github.com/apache/datafusion-comet/issues/1829";>issue #1829</a> 
for  [...]
+</ol>
+</section>
+</section>
 </section>
 </section>
 
@@ -513,20 +596,20 @@ directory with the filename <code class="docutils literal 
notranslate"><span cla
                   
 <div class="prev-next-area">
     <a class="left-prev"
-       href="adding_a_new_expression.html"
+       href="ffi.html"
        title="previous page">
       <i class="fa-solid fa-angle-left"></i>
       <div class="prev-next-info">
         <p class="prev-next-subtitle">previous</p>
-        <p class="prev-next-title">Adding a New Expression</p>
+        <p class="prev-next-title">Arrow FFI Usage in Comet</p>
       </div>
     </a>
     <a class="right-next"
-       href="profiling_native_code.html"
+       href="development.html"
        title="next page">
       <div class="prev-next-info">
         <p class="prev-next-subtitle">next</p>
-        <p class="prev-next-title">Profiling Native Code</p>
+        <p class="prev-next-title">Comet Development Guide</p>
       </div>
       <i class="fa-solid fa-angle-right"></i>
     </a>
diff --git a/contributor-guide/plugin_overview.html 
b/contributor-guide/plugin_overview.html
index 43df5e530..cd632fef3 100644
--- a/contributor-guide/plugin_overview.html
+++ b/contributor-guide/plugin_overview.html
@@ -357,6 +357,7 @@ under the License.
 <li class="toctree-l2 current"><a class="current reference internal" 
href="#">Comet Plugin Architecture</a></li>
 <li class="toctree-l2"><a class="reference internal" 
href="#plugin-components">Plugin Components</a></li>
 <li class="toctree-l2"><a class="reference internal" href="ffi.html">Arrow 
FFI</a></li>
+<li class="toctree-l2"><a class="reference internal" 
href="parquet_scans.html">Parquet Scans</a></li>
 <li class="toctree-l2"><a class="reference internal" 
href="development.html">Development Guide</a></li>
 <li class="toctree-l2"><a class="reference internal" 
href="debugging.html">Debugging Guide</a></li>
 <li class="toctree-l2"><a class="reference internal" 
href="benchmarking.html">Benchmarking Guide</a></li>
diff --git a/contributor-guide/profiling_native_code.html 
b/contributor-guide/profiling_native_code.html
index eef562b10..12afcbc1c 100644
--- a/contributor-guide/profiling_native_code.html
+++ b/contributor-guide/profiling_native_code.html
@@ -357,6 +357,7 @@ under the License.
 <li class="toctree-l2"><a class="reference internal" 
href="plugin_overview.html">Comet Plugin Architecture</a></li>
 <li class="toctree-l2"><a class="reference internal" 
href="plugin_overview.html#plugin-components">Plugin Components</a></li>
 <li class="toctree-l2"><a class="reference internal" href="ffi.html">Arrow 
FFI</a></li>
+<li class="toctree-l2"><a class="reference internal" 
href="parquet_scans.html">Parquet Scans</a></li>
 <li class="toctree-l2"><a class="reference internal" 
href="development.html">Development Guide</a></li>
 <li class="toctree-l2"><a class="reference internal" 
href="debugging.html">Debugging Guide</a></li>
 <li class="toctree-l2"><a class="reference internal" 
href="benchmarking.html">Benchmarking Guide</a></li>
diff --git a/contributor-guide/roadmap.html b/contributor-guide/roadmap.html
index ce9b33aa3..8bd2dd75d 100644
--- a/contributor-guide/roadmap.html
+++ b/contributor-guide/roadmap.html
@@ -357,6 +357,7 @@ under the License.
 <li class="toctree-l2"><a class="reference internal" 
href="plugin_overview.html">Comet Plugin Architecture</a></li>
 <li class="toctree-l2"><a class="reference internal" 
href="plugin_overview.html#plugin-components">Plugin Components</a></li>
 <li class="toctree-l2"><a class="reference internal" href="ffi.html">Arrow 
FFI</a></li>
+<li class="toctree-l2"><a class="reference internal" 
href="parquet_scans.html">Parquet Scans</a></li>
 <li class="toctree-l2"><a class="reference internal" 
href="development.html">Development Guide</a></li>
 <li class="toctree-l2"><a class="reference internal" 
href="debugging.html">Debugging Guide</a></li>
 <li class="toctree-l2"><a class="reference internal" 
href="benchmarking.html">Benchmarking Guide</a></li>
diff --git a/contributor-guide/spark-sql-tests.html 
b/contributor-guide/spark-sql-tests.html
index 6652284c8..33b687156 100644
--- a/contributor-guide/spark-sql-tests.html
+++ b/contributor-guide/spark-sql-tests.html
@@ -357,6 +357,7 @@ under the License.
 <li class="toctree-l2"><a class="reference internal" 
href="plugin_overview.html">Comet Plugin Architecture</a></li>
 <li class="toctree-l2"><a class="reference internal" 
href="plugin_overview.html#plugin-components">Plugin Components</a></li>
 <li class="toctree-l2"><a class="reference internal" href="ffi.html">Arrow 
FFI</a></li>
+<li class="toctree-l2"><a class="reference internal" 
href="parquet_scans.html">Parquet Scans</a></li>
 <li class="toctree-l2"><a class="reference internal" 
href="development.html">Development Guide</a></li>
 <li class="toctree-l2"><a class="reference internal" 
href="debugging.html">Debugging Guide</a></li>
 <li class="toctree-l2"><a class="reference internal" 
href="benchmarking.html">Benchmarking Guide</a></li>
diff --git a/contributor-guide/tracing.html b/contributor-guide/tracing.html
index 74cffacaa..2245a3ce0 100644
--- a/contributor-guide/tracing.html
+++ b/contributor-guide/tracing.html
@@ -357,6 +357,7 @@ under the License.
 <li class="toctree-l2"><a class="reference internal" 
href="plugin_overview.html">Comet Plugin Architecture</a></li>
 <li class="toctree-l2"><a class="reference internal" 
href="plugin_overview.html#plugin-components">Plugin Components</a></li>
 <li class="toctree-l2"><a class="reference internal" href="ffi.html">Arrow 
FFI</a></li>
+<li class="toctree-l2"><a class="reference internal" 
href="parquet_scans.html">Parquet Scans</a></li>
 <li class="toctree-l2"><a class="reference internal" 
href="development.html">Development Guide</a></li>
 <li class="toctree-l2"><a class="reference internal" 
href="debugging.html">Debugging Guide</a></li>
 <li class="toctree-l2"><a class="reference internal" 
href="benchmarking.html">Benchmarking Guide</a></li>
diff --git a/objects.inv b/objects.inv
index 6771033be..cc6ac2420 100644
Binary files a/objects.inv and b/objects.inv differ
diff --git a/searchindex.js b/searchindex.js
index 60ce38c92..32384baf1 100644
--- a/searchindex.js
+++ b/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Install Comet": [[17, "install-comet"]], 
"2. Clone Spark and Apply Diff": [[17, "clone-spark-and-apply-diff"]], "3. Run 
Spark SQL Tests": [[17, "run-spark-sql-tests"]], "ANSI Mode": [[20, 
"ansi-mode"], [33, "ansi-mode"], [73, "ansi-mode"]], "ANSI mode": [[46, 
"ansi-mode"], [59, "ansi-mode"]], "API Differences Between Spark Versions": 
[[3, "api-differences-between-spark-versions"]], "ASF Links": [[2, null], [2, 
null]], "Accelerating Apache Iceberg Parque [...]
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Install Comet": [[18, "install-comet"]], 
"2. Clone Spark and Apply Diff": [[18, "clone-spark-and-apply-diff"]], "3. Run 
Spark SQL Tests": [[18, "run-spark-sql-tests"]], "ANSI Mode": [[21, 
"ansi-mode"], [34, "ansi-mode"], [74, "ansi-mode"]], "ANSI mode": [[47, 
"ansi-mode"], [60, "ansi-mode"]], "API Differences Between Spark Versions": 
[[3, "api-differences-between-spark-versions"]], "ASF Links": [[2, null], [2, 
null]], "Accelerating Apache Iceberg Parque [...]
\ No newline at end of file
diff --git a/user-guide/latest/compatibility.html 
b/user-guide/latest/compatibility.html
index b55c99b35..5903394bd 100644
--- a/user-guide/latest/compatibility.html
+++ b/user-guide/latest/compatibility.html
@@ -464,71 +464,11 @@ under the License.
 <p>This guide offers information about areas of functionality where there are 
known differences.</p>
 <section id="parquet">
 <h2>Parquet<a class="headerlink" href="#parquet" title="Link to this 
heading">#</a></h2>
-<section id="data-type-support">
-<h3>Data Type Support<a class="headerlink" href="#data-type-support" 
title="Link to this heading">#</a></h3>
-<p>Comet does not support reading decimals encoded in binary format.</p>
-</section>
-<section id="parquet-scans">
-<h3>Parquet Scans<a class="headerlink" href="#parquet-scans" title="Link to 
this heading">#</a></h3>
-<p>Comet currently has three distinct implementations of the Parquet scan 
operator. The configuration property
-<code class="docutils literal notranslate"><span 
class="pre">spark.comet.scan.impl</span></code> is used to select an 
implementation. The default setting is <code class="docutils literal 
notranslate"><span class="pre">spark.comet.scan.impl=auto</span></code>, and
-Comet will choose the most appropriate implementation based on the Parquet 
schema and other Comet configuration
-settings. Most users should not need to change this setting. However, it is 
possible to force Comet to try and use
-a particular implementation for all scan operations by setting this 
configuration property to one of the following
-implementations.</p>
-<div class="pst-scrollable-table-container"><table class="table">
-<thead>
-<tr class="row-odd"><th class="head"><p>Implementation</p></th>
-<th class="head"><p>Description</p></th>
-</tr>
-</thead>
-<tbody>
-<tr class="row-even"><td><p><code class="docutils literal notranslate"><span 
class="pre">native_comet</span></code></p></td>
-<td><p>This implementation provides strong compatibility with Spark but does 
not support complex types. This is the original scan implementation in Comet 
and may eventually be removed.</p></td>
-</tr>
-<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span 
class="pre">native_iceberg_compat</span></code></p></td>
-<td><p>This implementation delegates to DataFusion’s <code class="docutils 
literal notranslate"><span class="pre">DataSourceExec</span></code> but uses a 
hybrid approach of JVM and native code. This scan is designed to be integrated 
with Iceberg in the future.</p></td>
-</tr>
-<tr class="row-even"><td><p><code class="docutils literal notranslate"><span 
class="pre">native_datafusion</span></code></p></td>
-<td><p>This experimental implementation delegates to DataFusion’s <code 
class="docutils literal notranslate"><span 
class="pre">DataSourceExec</span></code> for full native execution. There are 
known compatibility issues when using this scan.</p></td>
-</tr>
-</tbody>
-</table>
-</div>
-<p>The <code class="docutils literal notranslate"><span 
class="pre">native_datafusion</span></code> and <code class="docutils literal 
notranslate"><span class="pre">native_iceberg_compat</span></code> scans 
provide the following benefits over the <code class="docutils literal 
notranslate"><span class="pre">native_comet</span></code>
-implementation:</p>
-<ul class="simple">
-<li><p>Leverages the DataFusion community’s ongoing improvements to <code 
class="docutils literal notranslate"><span 
class="pre">DataSourceExec</span></code></p></li>
-<li><p>Provides support for reading complex types (structs, arrays, and 
maps)</p></li>
-<li><p>Removes the use of reusable mutable-buffers in Comet, which is complex 
to maintain</p></li>
-<li><p>Improves performance</p></li>
-</ul>
-<p>The <code class="docutils literal notranslate"><span 
class="pre">native_datafusion</span></code> and <code class="docutils literal 
notranslate"><span class="pre">native_iceberg_compat</span></code> scans share 
the following limitations:</p>
+<p>Comet has the following limitations when reading Parquet files:</p>
 <ul class="simple">
-<li><p>When reading Parquet files written by systems other than Spark that 
contain columns with the logical types <code class="docutils literal 
notranslate"><span class="pre">UINT_8</span></code>
-or <code class="docutils literal notranslate"><span 
class="pre">UINT_16</span></code>, Comet will produce different results than 
Spark because Spark does not preserve or understand these
-logical types. Arrow-based readers, such as DataFusion and Comet do respect 
these types and read the data as unsigned
-rather than signed. By default, Comet will fall back to <code class="docutils 
literal notranslate"><span class="pre">native_comet</span></code> when scanning 
Parquet files containing <code class="docutils literal notranslate"><span 
class="pre">byte</span></code> or <code class="docutils literal 
notranslate"><span class="pre">short</span></code>
-types (regardless of the logical type). This behavior can be disabled by 
setting
-<code class="docutils literal notranslate"><span 
class="pre">spark.comet.scan.allowIncompatible=true</span></code>.</p></li>
+<li><p>Comet does not support reading decimals encoded in binary 
format.</p></li>
 <li><p>No support for default values that are nested types (e.g., maps, 
arrays, structs). Literal default values are supported.</p></li>
 </ul>
-<p>The <code class="docutils literal notranslate"><span 
class="pre">native_datafusion</span></code> scan has some additional 
limitations:</p>
-<ul class="simple">
-<li><p>Bucketed scans are not supported</p></li>
-<li><p>No support for row indexes</p></li>
-<li><p><code class="docutils literal notranslate"><span 
class="pre">PARQUET_FIELD_ID_READ_ENABLED</span></code> is not respected <a 
class="reference external" 
href="https://github.com/apache/datafusion-comet/issues/1758";>#1758</a></p></li>
-<li><p>There are failures in the Spark SQL test suite <a class="reference 
external" 
href="https://github.com/apache/datafusion-comet/issues/1545";>#1545</a></p></li>
-<li><p>Setting Spark configs <code class="docutils literal notranslate"><span 
class="pre">ignoreMissingFiles</span></code> or <code class="docutils literal 
notranslate"><span class="pre">ignoreCorruptFiles</span></code> to <code 
class="docutils literal notranslate"><span class="pre">true</span></code> is 
not compatible with Spark</p></li>
-</ul>
-</section>
-<section id="s3-support-with-native-iceberg-compat">
-<h3>S3 Support with <code class="docutils literal notranslate"><span 
class="pre">native_iceberg_compat</span></code><a class="headerlink" 
href="#s3-support-with-native-iceberg-compat" title="Link to this 
heading">#</a></h3>
-<ul class="simple">
-<li><p>When using the default AWS S3 endpoint (no custom endpoint configured), 
a valid region is required. Comet
-will attempt to resolve the region if it is not provided.</p></li>
-</ul>
-</section>
 </section>
 <section id="ansi-mode">
 <h2>ANSI Mode<a class="headerlink" href="#ansi-mode" title="Link to this 
heading">#</a></h2>
@@ -551,18 +491,14 @@ So Comet will add additional normalization expression of 
NaN and zero for compar
 <p>Sorting on floating-point data types (or complex types containing 
floating-point values) is not compatible with
 Spark if the data contains both zero and negative zero. This is likely an edge 
case that is not of concern for many users
 and sorting on floating-point data can be enabled by setting <code 
class="docutils literal notranslate"><span 
class="pre">spark.comet.expression.SortOrder.allowIncompatible=true</span></code>.</p>
-<p>There is a known bug with using count(distinct) within aggregate queries, 
where each NaN value will be counted
-separately <a class="reference external" 
href="https://github.com/apache/datafusion-comet/issues/1824";>#1824</a>.</p>
 </section>
 <section id="incompatible-expressions">
 <h2>Incompatible Expressions<a class="headerlink" 
href="#incompatible-expressions" title="Link to this heading">#</a></h2>
-<p>Some Comet native expressions are not 100% compatible with Spark and are 
disabled by default. These expressions
-will fall back to Spark but can be enabled by setting <code class="docutils 
literal notranslate"><span 
class="pre">spark.comet.expression.allowIncompatible=true</span></code>.</p>
-</section>
-<section id="array-expressions">
-<h2>Array Expressions<a class="headerlink" href="#array-expressions" 
title="Link to this heading">#</a></h2>
-<p>Comet has experimental support for a number of array expressions. These are 
experimental and currently marked
-as incompatible and can be enabled by setting <code class="docutils literal 
notranslate"><span 
class="pre">spark.comet.expression.allowIncompatible=true</span></code>.</p>
+<p>Expressions that are not 100% Spark-compatible will fall back to Spark by 
default and can be enabled by setting
+<code class="docutils literal notranslate"><span 
class="pre">spark.comet.expression.EXPRNAME.allowIncompatible=true</span></code>,
 where <code class="docutils literal notranslate"><span 
class="pre">EXPRNAME</span></code> is the Spark expression class name. See
+the <a class="reference internal" href="expressions.html"><span class="std 
std-doc">Comet Supported Expressions Guide</span></a> for more information on 
this configuration setting.</p>
+<p>It is also possible to specify <code class="docutils literal 
notranslate"><span 
class="pre">spark.comet.expression.allowIncompatible=true</span></code> to 
enable all
+incompatible expressions.</p>
 </section>
 <section id="regular-expressions">
 <h2>Regular Expressions<a class="headerlink" href="#regular-expressions" 
title="Link to this heading">#</a></h2>
@@ -577,7 +513,7 @@ this can be overridden by setting <code class="docutils 
literal notranslate"><sp
 <li><p><strong>Compatible</strong>: The results match Apache Spark</p></li>
 <li><p><strong>Incompatible</strong>: The results may match Apache Spark for 
some inputs, but there are known issues where some inputs
 will result in incorrect results or exceptions. The query stage will fall back 
to Spark by default. Setting
-<code class="docutils literal notranslate"><span 
class="pre">spark.comet.expression.allowIncompatible=true</span></code> will 
allow all incompatible casts to run natively in Comet, but this is not
+<code class="docutils literal notranslate"><span 
class="pre">spark.comet.expression.Cast.allowIncompatible=true</span></code> 
will allow all incompatible casts to run natively in Comet, but this is not
 recommended for production use.</p></li>
 <li><p><strong>Unsupported</strong>: Comet does not provide a native version 
of this cast expression and the query stage will fall back to
 Spark.</p></li>
diff --git a/user-guide/latest/datasources.html 
b/user-guide/latest/datasources.html
index 0cc56d422..0b951336e 100644
--- a/user-guide/latest/datasources.html
+++ b/user-guide/latest/datasources.html
@@ -598,25 +598,16 @@ Input<span class="w"> </span><span 
class="o">[</span><span class="m">3</span><sp
 </section>
 <section id="s3">
 <h2>S3<a class="headerlink" href="#s3" title="Link to this heading">#</a></h2>
-<p>DataFusion Comet has <a class="reference internal" 
href="compatibility.html#parquet-scans"><span class="std std-ref">multiple 
Parquet scan implementations</span></a> that use different approaches to read 
data from S3.</p>
-<section id="native-comet">
-<h3><code class="docutils literal notranslate"><span 
class="pre">native_comet</span></code><a class="headerlink" 
href="#native-comet" title="Link to this heading">#</a></h3>
-<p>The default <code class="docutils literal notranslate"><span 
class="pre">native_comet</span></code> Parquet scan implementation reads data 
from S3 using the <a class="reference external" 
href="https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html";>Hadoop-AWS
 module</a>, which is identical to the approach commonly used with vanilla 
Spark. AWS credential configuration and other Hadoop S3A configurations works 
the same way as in vanilla Spark.</p>
-</section>
-<section id="native-datafusion-and-native-iceberg-compat">
-<h3><code class="docutils literal notranslate"><span 
class="pre">native_datafusion</span></code> and <code class="docutils literal 
notranslate"><span class="pre">native_iceberg_compat</span></code><a 
class="headerlink" href="#native-datafusion-and-native-iceberg-compat" 
title="Link to this heading">#</a></h3>
-<p>The <code class="docutils literal notranslate"><span 
class="pre">native_datafusion</span></code> and <code class="docutils literal 
notranslate"><span class="pre">native_iceberg_compat</span></code> Parquet scan 
implementations completely offload data loading to native code. They use the <a 
class="reference external" href="https://crates.io/crates/object_store";><code 
class="docutils literal notranslate"><span 
class="pre">object_store</span></code> crate</a> to read data from S3 and sup 
[...]
-<p>This implementation maintains compatibility with existing Hadoop S3A 
configurations, so existing code will continue to work as long as the 
configurations are supported and can be translated without loss of 
functionality.</p>
 <section id="root-ca-certificates">
-<h4>Root CA Certificates<a class="headerlink" href="#root-ca-certificates" 
title="Link to this heading">#</a></h4>
-<p>One major difference between <code class="docutils literal 
notranslate"><span class="pre">native_comet</span></code> and the other scan 
implementations is the mechanism for discovering Root
-CA Certificates. The <code class="docutils literal notranslate"><span 
class="pre">native_comet</span></code> scan uses the JVM to read CA 
Certificates from the Java Trust Store, but the native
-scan implementations <code class="docutils literal notranslate"><span 
class="pre">native_datafusion</span></code> and <code class="docutils literal 
notranslate"><span class="pre">native_iceberg_compat</span></code> use system 
Root CA Certificates (typically stored
+<h3>Root CA Certificates<a class="headerlink" href="#root-ca-certificates" 
title="Link to this heading">#</a></h3>
+<p>One major difference between Spark and Comet is the mechanism for 
discovering Root
+CA Certificates. Spark uses the JVM to read CA Certificates from the Java 
Trust Store, but native Comet
+scans use system Root CA Certificates (typically stored
 in <code class="docutils literal notranslate"><span 
class="pre">/etc/ssl/certs</span></code> on Linux). These scans will not be 
able to interact with S3 if the Root CA Certificates are not
 installed.</p>
 </section>
 <section id="supported-credential-providers">
-<h4>Supported Credential Providers<a class="headerlink" 
href="#supported-credential-providers" title="Link to this heading">#</a></h4>
+<h3>Supported Credential Providers<a class="headerlink" 
href="#supported-credential-providers" title="Link to this heading">#</a></h3>
 <p>AWS credential providers can be configured using the <code class="docutils 
literal notranslate"><span 
class="pre">fs.s3a.aws.credentials.provider</span></code> configuration. The 
following table shows the supported credential providers and their 
configuration options:</p>
 <div class="pst-scrollable-table-container"><table class="table">
 <thead>
@@ -667,68 +658,6 @@ installed.</p>
 </div>
 <p>Multiple credential providers can be specified in a comma-separated list 
using the <code class="docutils literal notranslate"><span 
class="pre">fs.s3a.aws.credentials.provider</span></code> configuration, just 
as Hadoop AWS supports. If <code class="docutils literal notranslate"><span 
class="pre">fs.s3a.aws.credentials.provider</span></code> is not configured, 
Hadoop S3A’s default credential provider chain will be used. All configuration 
options also support bucket-specific overrides  [...]
 </section>
-<section id="additional-s3-configuration-options">
-<h4>Additional S3 Configuration Options<a class="headerlink" 
href="#additional-s3-configuration-options" title="Link to this 
heading">#</a></h4>
-<p>Beyond credential providers, the <code class="docutils literal 
notranslate"><span class="pre">native_datafusion</span></code> implementation 
supports additional S3 configuration options:</p>
-<div class="pst-scrollable-table-container"><table class="table">
-<thead>
-<tr class="row-odd"><th class="head"><p>Option</p></th>
-<th class="head"><p>Description</p></th>
-</tr>
-</thead>
-<tbody>
-<tr class="row-even"><td><p><code class="docutils literal notranslate"><span 
class="pre">fs.s3a.endpoint</span></code></p></td>
-<td><p>The endpoint of the S3 service</p></td>
-</tr>
-<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span 
class="pre">fs.s3a.endpoint.region</span></code></p></td>
-<td><p>The AWS region for the S3 service. If not specified, the region will be 
auto-detected.</p></td>
-</tr>
-<tr class="row-even"><td><p><code class="docutils literal notranslate"><span 
class="pre">fs.s3a.path.style.access</span></code></p></td>
-<td><p>Whether to use path style access for the S3 service (true/false, 
defaults to virtual hosted style)</p></td>
-</tr>
-<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span 
class="pre">fs.s3a.requester.pays.enabled</span></code></p></td>
-<td><p>Whether to enable requester pays for S3 requests (true/false)</p></td>
-</tr>
-</tbody>
-</table>
-</div>
-<p>All configuration options support bucket-specific overrides using the 
pattern <code class="docutils literal notranslate"><span 
class="pre">fs.s3a.bucket.{bucket-name}.{option}</span></code>.</p>
-</section>
-<section id="examples">
-<h4>Examples<a class="headerlink" href="#examples" title="Link to this 
heading">#</a></h4>
-<p>The following examples demonstrate how to configure S3 access with the 
<code class="docutils literal notranslate"><span 
class="pre">native_datafusion</span></code> Parquet scan implementation using 
different authentication methods.</p>
-<p><strong>Example 1: Simple Credentials</strong></p>
-<p>This example shows how to access a private S3 bucket using an access key 
and secret key. The <code class="docutils literal notranslate"><span 
class="pre">fs.s3a.aws.credentials.provider</span></code> configuration can be 
omitted since <code class="docutils literal notranslate"><span 
class="pre">org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider</span></code> 
is included in Hadoop S3A’s default credential provider chain.</p>
-<div class="highlight-shell notranslate"><div 
class="highlight"><pre><span></span><span 
class="nv">$SPARK_HOME</span>/bin/spark-shell<span class="w"> </span><span 
class="se">\</span>
-...
---conf<span class="w"> </span>spark.comet.scan.impl<span 
class="o">=</span>native_datafusion<span class="w"> </span><span 
class="se">\</span>
---conf<span class="w"> </span>spark.hadoop.fs.s3a.access.key<span 
class="o">=</span>my-access-key<span class="w"> </span><span class="se">\</span>
---conf<span class="w"> </span>spark.hadoop.fs.s3a.secret.key<span 
class="o">=</span>my-secret-key
-...
-</pre></div>
-</div>
-<p><strong>Example 2: Assume Role with Web Identity Token</strong></p>
-<p>This example demonstrates using an assumed role credential to access a 
private S3 bucket, where the base credential for assuming the role is provided 
by a web identity token credentials provider.</p>
-<div class="highlight-shell notranslate"><div 
class="highlight"><pre><span></span><span 
class="nv">$SPARK_HOME</span>/bin/spark-shell<span class="w"> </span><span 
class="se">\</span>
-...
---conf<span class="w"> </span>spark.comet.scan.impl<span 
class="o">=</span>native_datafusion<span class="w"> </span><span 
class="se">\</span>
---conf<span class="w"> 
</span>spark.hadoop.fs.s3a.aws.credentials.provider<span 
class="o">=</span>org.apache.hadoop.fs.s3a.auth.AssumedRoleCredentialProvider<span
 class="w"> </span><span class="se">\</span>
---conf<span class="w"> </span>spark.hadoop.fs.s3a.assumed.role.arn<span 
class="o">=</span>arn:aws:iam::123456789012:role/my-role<span class="w"> 
</span><span class="se">\</span>
---conf<span class="w"> 
</span>spark.hadoop.fs.s3a.assumed.role.session.name<span 
class="o">=</span>my-session<span class="w"> </span><span class="se">\</span>
---conf<span class="w"> 
</span>spark.hadoop.fs.s3a.assumed.role.credentials.provider<span 
class="o">=</span>com.amazonaws.auth.WebIdentityTokenCredentialsProvider
-...
-</pre></div>
-</div>
-</section>
-<section id="limitations">
-<h4>Limitations<a class="headerlink" href="#limitations" title="Link to this 
heading">#</a></h4>
-<p>The S3 support of <code class="docutils literal notranslate"><span 
class="pre">native_datafusion</span></code> has the following limitations:</p>
-<ol class="arabic simple">
-<li><p><strong>Partial Hadoop S3A configuration support</strong>: Not all 
Hadoop S3A configurations are currently supported. Only the configurations 
listed in the tables above are translated and applied to the underlying <code 
class="docutils literal notranslate"><span 
class="pre">object_store</span></code> crate.</p></li>
-<li><p><strong>Custom credential providers</strong>: Custom implementations of 
AWS credential providers are not supported. The implementation only supports 
the standard credential providers listed in the table above. We are planning to 
add support for custom credential providers through a JNI-based adapter that 
will allow calling Java credential providers from native code. See <a 
class="reference external" 
href="https://github.com/apache/datafusion-comet/issues/1829";>issue #1829</a> 
for  [...]
-</ol>
-</section>
-</section>
 </section>
 </section>
 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(datafusion-comet) branch asf-site updated: Publish built docs triggered by dba523d994f3f8336d2c5ca469c61672768611a1

Reply via email to