(datafusion-comet) branch asf-site updated: Publish built docs triggered by 7cf2e9dc9f1cba4f172ea6bdc1a6ac23c859b4d7

github-bot Tue, 03 Jun 2025 10:37:03 -0700

This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/datafusion-comet.git



The following commit(s) were added to refs/heads/asf-site by this push:
     new 912a7e907 Publish built docs triggered by 
7cf2e9dc9f1cba4f172ea6bdc1a6ac23c859b4d7
912a7e907 is described below

commit 912a7e907002fba3f12296d475a2ac63ee6798ea
Author: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
AuthorDate: Tue Jun 3 17:36:03 2025 +0000

    Publish built docs triggered by 7cf2e9dc9f1cba4f172ea6bdc1a6ac23c859b4d7
---
 _sources/user-guide/datasources.md.txt |  89 ++++++++++++++++-
 searchindex.js                         |   2 +-
 user-guide/datasources.html            | 177 ++++++++++++++++++++++++++++++++-
 3 files changed, 265 insertions(+), 3 deletions(-)

diff --git a/_sources/user-guide/datasources.md.txt 
b/_sources/user-guide/datasources.md.txt
index ddf02770e..e6a550926 100644
--- a/_sources/user-guide/datasources.md.txt
+++ b/_sources/user-guide/datasources.md.txt
@@ -154,5 +154,92 @@ JAVA_HOME="/opt/homebrew/opt/openjdk@11" make release 
PROFILES="-Pspark-3.5" COM
   }
 ```
 Or use `spark-shell` with HDFS support as described 
[above](#using-experimental-native-datafusion-reader)
+
 ## S3
-In progress 
+
+DataFusion Comet has [multiple Parquet scan 
implementations](./compatibility.md#parquet-scans) that use different 
approaches to read data from S3.
+
+### `native_comet`
+
+The default `native_comet` Parquet scan implementation reads data from S3 
using the [Hadoop-AWS 
module](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html),
 which is identical to the approach commonly used with vanilla Spark. AWS 
credential configuration and other Hadoop S3A configurations works the same way 
as in vanilla Spark.
+
+### `native_datafusion`
+
+The `native_datafusion` Parquet scan implementation completely offloads data 
loading to native code. It uses the [`object_store` 
crate](https://crates.io/crates/object_store) to read data from S3 and supports 
configuring S3 access using standard [Hadoop S3A 
configurations](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#General_S3A_Client_configuration)
 by translating them to the `object_store` crate's format.
+
+This implementation maintains compatibility with existing Hadoop S3A 
configurations, so existing code will continue to work as long as the 
configurations are supported and can be translated without loss of 
functionality.
+
+#### Supported Credential Providers
+
+AWS credential providers can be configured using the 
`fs.s3a.aws.credentials.provider` configuration. The following table shows the 
supported credential providers and their configuration options:
+
+| Credential provider | Description | Supported Options |
+|---------------------|-------------|-------------------|
+| `org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider` | Access S3 using 
access key and secret key | `fs.s3a.access.key`, `fs.s3a.secret.key` |
+| `org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider` | Access S3 using 
temporary credentials | `fs.s3a.access.key`, `fs.s3a.secret.key`, 
`fs.s3a.session.token` |
+| `org.apache.hadoop.fs.s3a.auth.AssumedRoleCredentialProvider` | Access S3 
using AWS STS assume role | `fs.s3a.assumed.role.arn`, 
`fs.s3a.assumed.role.session.name` (optional), 
`fs.s3a.assumed.role.credentials.provider` (optional) |
+| `org.apache.hadoop.fs.s3a.auth.IAMInstanceCredentialsProvider` | Access S3 
using EC2 instance profile or ECS task credentials (tries ECS first, then IMDS) 
| None (auto-detected) |
+| 
`org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider`<br/>`com.amazonaws.auth.AnonymousAWSCredentials`<br/>`software.amazon.awssdk.auth.credentials.AnonymousCredentialsProvider`
 | Access S3 without authentication (public buckets only) | None |
+| 
`com.amazonaws.auth.EnvironmentVariableCredentialsProvider`<br/>`software.amazon.awssdk.auth.credentials.EnvironmentVariableCredentialsProvider`
 | Load credentials from environment variables (`AWS_ACCESS_KEY_ID`, 
`AWS_SECRET_ACCESS_KEY`, `AWS_SESSION_TOKEN`) | None |
+| 
`com.amazonaws.auth.InstanceProfileCredentialsProvider`<br/>`software.amazon.awssdk.auth.credentials.InstanceProfileCredentialsProvider`
 | Access S3 using EC2 instance metadata service (IMDS) | None |
+| 
`com.amazonaws.auth.ContainerCredentialsProvider`<br/>`software.amazon.awssdk.auth.credentials.ContainerCredentialsProvider`<br/>`com.amazonaws.auth.EC2ContainerCredentialsProviderWrapper`
 | Access S3 using ECS task credentials | None |
+| 
`com.amazonaws.auth.WebIdentityTokenCredentialsProvider`<br/>`software.amazon.awssdk.auth.credentials.WebIdentityTokenFileCredentialsProvider`
 | Authenticate using web identity token file | None |
+
+Multiple credential providers can be specified in a comma-separated list using 
the `fs.s3a.aws.credentials.provider` configuration, just as Hadoop AWS 
supports. If `fs.s3a.aws.credentials.provider` is not configured, Hadoop S3A's 
default credential provider chain will be used. All configuration options also 
support bucket-specific overrides using the pattern 
`fs.s3a.bucket.{bucket-name}.{option}`.
+
+#### Additional S3 Configuration Options
+
+Beyond credential providers, the `native_datafusion` implementation supports 
additional S3 configuration options:
+
+| Option | Description |
+|--------|-------------|
+| `fs.s3a.endpoint` | The endpoint of the S3 service |
+| `fs.s3a.endpoint.region` | The AWS region for the S3 service. If not 
specified, the region will be auto-detected. |
+| `fs.s3a.path.style.access` | Whether to use path style access for the S3 
service (true/false, defaults to virtual hosted style) |
+| `fs.s3a.requester.pays.enabled` | Whether to enable requester pays for S3 
requests (true/false) |
+
+All configuration options support bucket-specific overrides using the pattern 
`fs.s3a.bucket.{bucket-name}.{option}`.
+
+#### Examples
+
+The following examples demonstrate how to configure S3 access with the 
`native_datafusion` Parquet scan implementation using different authentication 
methods.
+
+**Example 1: Simple Credentials**
+
+This example shows how to access a private S3 bucket using an access key and 
secret key. The `fs.s3a.aws.credentials.provider` configuration can be omitted 
since `org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider` is included in 
Hadoop S3A's default credential provider chain.
+
+```shell
+$SPARK_HOME/bin/spark-shell \
+...
+--conf spark.comet.scan.impl=native_datafusion \
+--conf spark.hadoop.fs.s3a.access.key=my-access-key \
+--conf spark.hadoop.fs.s3a.secret.key=my-secret-key
+...
+```
+
+**Example 2: Assume Role with Web Identity Token**
+
+This example demonstrates using an assumed role credential to access a private 
S3 bucket, where the base credential for assuming the role is provided by a web 
identity token credentials provider.
+
+```shell
+$SPARK_HOME/bin/spark-shell \
+...
+--conf spark.comet.scan.impl=native_datafusion \
+--conf 
spark.hadoop.fs.s3a.aws.credentials.provider=org.apache.hadoop.fs.s3a.auth.AssumedRoleCredentialProvider
 \
+--conf 
spark.hadoop.fs.s3a.assumed.role.arn=arn:aws:iam::123456789012:role/my-role \
+--conf spark.hadoop.fs.s3a.assumed.role.session.name=my-session \
+--conf 
spark.hadoop.fs.s3a.assumed.role.credentials.provider=com.amazonaws.auth.WebIdentityTokenCredentialsProvider
+...
+```
+
+#### Limitations
+
+The S3 support of `native_datafusion` has the following limitations:
+
+1. **Partial Hadoop S3A configuration support**: Not all Hadoop S3A 
configurations are currently supported. Only the configurations listed in the 
tables above are translated and applied to the underlying `object_store` crate.
+
+2. **Custom credential providers**: Custom implementations of AWS credential 
providers are not supported. The implementation only supports the standard 
credential providers listed in the table above. We are planning to add support 
for custom credential providers through a JNI-based adapter that will allow 
calling Java credential providers from native code. See [issue 
#1829](https://github.com/apache/datafusion-comet/issues/1829) for more details.
+
+### `native_iceberg_compat`
+
+The `native_iceberg_compat` Parquet scan implementation does not support 
reading data from S3 yet, but we are working on it.
diff --git a/searchindex.js b/searchindex.js
index 880ef53ac..567e9c355 100644
--- a/searchindex.js
+++ b/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Install Comet": [[11, "install-comet"]], 
"2. Clone Spark and Apply Diff": [[11, "clone-spark-and-apply-diff"]], "3. Run 
Spark SQL Tests": [[11, "run-spark-sql-tests"]], "ANSI mode": [[14, 
"ansi-mode"]], "API Differences Between Spark Versions": [[0, 
"api-differences-between-spark-versions"]], "ASF Links": [[13, null]], 
"Accelerating Apache Iceberg Parquet Scans using Comet (Experimental)": [[19, 
null]], "Adding Spark-side Tests for the New Expression":  [...]
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Install Comet": [[11, "install-comet"]], 
"2. Clone Spark and Apply Diff": [[11, "clone-spark-and-apply-diff"]], "3. Run 
Spark SQL Tests": [[11, "run-spark-sql-tests"]], "ANSI mode": [[14, 
"ansi-mode"]], "API Differences Between Spark Versions": [[0, 
"api-differences-between-spark-versions"]], "ASF Links": [[13, null]], 
"Accelerating Apache Iceberg Parquet Scans using Comet (Experimental)": [[19, 
null]], "Adding Spark-side Tests for the New Expression":  [...]
\ No newline at end of file
diff --git a/user-guide/datasources.html b/user-guide/datasources.html
index 772109d4c..fcbe7bb5c 100644
--- a/user-guide/datasources.html
+++ b/user-guide/datasources.html
@@ -351,6 +351,57 @@ under the License.
   <a class="reference internal nav-link" href="#s3">
    S3
   </a>
+  <ul class="nav section-nav flex-column">
+   <li class="toc-h3 nav-item toc-entry">
+    <a class="reference internal nav-link" href="#native-comet">
+     <code class="docutils literal notranslate">
+      <span class="pre">
+       native_comet
+      </span>
+     </code>
+    </a>
+   </li>
+   <li class="toc-h3 nav-item toc-entry">
+    <a class="reference internal nav-link" href="#native-datafusion">
+     <code class="docutils literal notranslate">
+      <span class="pre">
+       native_datafusion
+      </span>
+     </code>
+    </a>
+    <ul class="nav section-nav flex-column">
+     <li class="toc-h4 nav-item toc-entry">
+      <a class="reference internal nav-link" 
href="#supported-credential-providers">
+       Supported Credential Providers
+      </a>
+     </li>
+     <li class="toc-h4 nav-item toc-entry">
+      <a class="reference internal nav-link" 
href="#additional-s3-configuration-options">
+       Additional S3 Configuration Options
+      </a>
+     </li>
+     <li class="toc-h4 nav-item toc-entry">
+      <a class="reference internal nav-link" href="#examples">
+       Examples
+      </a>
+     </li>
+     <li class="toc-h4 nav-item toc-entry">
+      <a class="reference internal nav-link" href="#limitations">
+       Limitations
+      </a>
+     </li>
+    </ul>
+   </li>
+   <li class="toc-h3 nav-item toc-entry">
+    <a class="reference internal nav-link" href="#native-iceberg-compat">
+     <code class="docutils literal notranslate">
+      <span class="pre">
+       native_iceberg_compat
+      </span>
+     </code>
+    </a>
+   </li>
+  </ul>
  </li>
 </ul>
 
@@ -534,7 +585,131 @@ Input<span class="w"> </span><span 
class="o">[</span><span class="m">3</span><sp
 </section>
 <section id="s3">
 <h2>S3<a class="headerlink" href="#s3" title="Link to this heading">¶</a></h2>
-<p>In progress</p>
+<p>DataFusion Comet has <a class="reference internal" 
href="compatibility.html#parquet-scans"><span class="std std-ref">multiple 
Parquet scan implementations</span></a> that use different approaches to read 
data from S3.</p>
+<section id="native-comet">
+<h3><code class="docutils literal notranslate"><span 
class="pre">native_comet</span></code><a class="headerlink" 
href="#native-comet" title="Link to this heading">¶</a></h3>
+<p>The default <code class="docutils literal notranslate"><span 
class="pre">native_comet</span></code> Parquet scan implementation reads data 
from S3 using the <a class="reference external" 
href="https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html";>Hadoop-AWS
 module</a>, which is identical to the approach commonly used with vanilla 
Spark. AWS credential configuration and other Hadoop S3A configurations works 
the same way as in vanilla Spark.</p>
+</section>
+<section id="native-datafusion">
+<h3><code class="docutils literal notranslate"><span 
class="pre">native_datafusion</span></code><a class="headerlink" 
href="#native-datafusion" title="Link to this heading">¶</a></h3>
+<p>The <code class="docutils literal notranslate"><span 
class="pre">native_datafusion</span></code> Parquet scan implementation 
completely offloads data loading to native code. It uses the <a 
class="reference external" href="https://crates.io/crates/object_store";><code 
class="docutils literal notranslate"><span 
class="pre">object_store</span></code> crate</a> to read data from S3 and 
supports configuring S3 access using standard <a class="reference external" 
href="https://hadoop.apache.o [...]
+<p>This implementation maintains compatibility with existing Hadoop S3A 
configurations, so existing code will continue to work as long as the 
configurations are supported and can be translated without loss of 
functionality.</p>
+<section id="supported-credential-providers">
+<h4>Supported Credential Providers<a class="headerlink" 
href="#supported-credential-providers" title="Link to this heading">¶</a></h4>
+<p>AWS credential providers can be configured using the <code class="docutils 
literal notranslate"><span 
class="pre">fs.s3a.aws.credentials.provider</span></code> configuration. The 
following table shows the supported credential providers and their 
configuration options:</p>
+<table class="table">
+<thead>
+<tr class="row-odd"><th class="head"><p>Credential provider</p></th>
+<th class="head"><p>Description</p></th>
+<th class="head"><p>Supported Options</p></th>
+</tr>
+</thead>
+<tbody>
+<tr class="row-even"><td><p><code class="docutils literal notranslate"><span 
class="pre">org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider</span></code></p></td>
+<td><p>Access S3 using access key and secret key</p></td>
+<td><p><code class="docutils literal notranslate"><span 
class="pre">fs.s3a.access.key</span></code>, <code class="docutils literal 
notranslate"><span class="pre">fs.s3a.secret.key</span></code></p></td>
+</tr>
+<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span 
class="pre">org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider</span></code></p></td>
+<td><p>Access S3 using temporary credentials</p></td>
+<td><p><code class="docutils literal notranslate"><span 
class="pre">fs.s3a.access.key</span></code>, <code class="docutils literal 
notranslate"><span class="pre">fs.s3a.secret.key</span></code>, <code 
class="docutils literal notranslate"><span 
class="pre">fs.s3a.session.token</span></code></p></td>
+</tr>
+<tr class="row-even"><td><p><code class="docutils literal notranslate"><span 
class="pre">org.apache.hadoop.fs.s3a.auth.AssumedRoleCredentialProvider</span></code></p></td>
+<td><p>Access S3 using AWS STS assume role</p></td>
+<td><p><code class="docutils literal notranslate"><span 
class="pre">fs.s3a.assumed.role.arn</span></code>, <code class="docutils 
literal notranslate"><span 
class="pre">fs.s3a.assumed.role.session.name</span></code> (optional), <code 
class="docutils literal notranslate"><span 
class="pre">fs.s3a.assumed.role.credentials.provider</span></code> 
(optional)</p></td>
+</tr>
+<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span 
class="pre">org.apache.hadoop.fs.s3a.auth.IAMInstanceCredentialsProvider</span></code></p></td>
+<td><p>Access S3 using EC2 instance profile or ECS task credentials (tries ECS 
first, then IMDS)</p></td>
+<td><p>None (auto-detected)</p></td>
+</tr>
+<tr class="row-even"><td><p><code class="docutils literal notranslate"><span 
class="pre">org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider</span></code><br/><code
 class="docutils literal notranslate"><span 
class="pre">com.amazonaws.auth.AnonymousAWSCredentials</span></code><br/><code 
class="docutils literal notranslate"><span 
class="pre">software.amazon.awssdk.auth.credentials.AnonymousCredentialsProvider</span></code></p></td>
+<td><p>Access S3 without authentication (public buckets only)</p></td>
+<td><p>None</p></td>
+</tr>
+<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span 
class="pre">com.amazonaws.auth.EnvironmentVariableCredentialsProvider</span></code><br/><code
 class="docutils literal notranslate"><span 
class="pre">software.amazon.awssdk.auth.credentials.EnvironmentVariableCredentialsProvider</span></code></p></td>
+<td><p>Load credentials from environment variables (<code class="docutils 
literal notranslate"><span class="pre">AWS_ACCESS_KEY_ID</span></code>, <code 
class="docutils literal notranslate"><span 
class="pre">AWS_SECRET_ACCESS_KEY</span></code>, <code class="docutils literal 
notranslate"><span class="pre">AWS_SESSION_TOKEN</span></code>)</p></td>
+<td><p>None</p></td>
+</tr>
+<tr class="row-even"><td><p><code class="docutils literal notranslate"><span 
class="pre">com.amazonaws.auth.InstanceProfileCredentialsProvider</span></code><br/><code
 class="docutils literal notranslate"><span 
class="pre">software.amazon.awssdk.auth.credentials.InstanceProfileCredentialsProvider</span></code></p></td>
+<td><p>Access S3 using EC2 instance metadata service (IMDS)</p></td>
+<td><p>None</p></td>
+</tr>
+<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span 
class="pre">com.amazonaws.auth.ContainerCredentialsProvider</span></code><br/><code
 class="docutils literal notranslate"><span 
class="pre">software.amazon.awssdk.auth.credentials.ContainerCredentialsProvider</span></code><br/><code
 class="docutils literal notranslate"><span 
class="pre">com.amazonaws.auth.EC2ContainerCredentialsProviderWrapper</span></code></p></td>
+<td><p>Access S3 using ECS task credentials</p></td>
+<td><p>None</p></td>
+</tr>
+<tr class="row-even"><td><p><code class="docutils literal notranslate"><span 
class="pre">com.amazonaws.auth.WebIdentityTokenCredentialsProvider</span></code><br/><code
 class="docutils literal notranslate"><span 
class="pre">software.amazon.awssdk.auth.credentials.WebIdentityTokenFileCredentialsProvider</span></code></p></td>
+<td><p>Authenticate using web identity token file</p></td>
+<td><p>None</p></td>
+</tr>
+</tbody>
+</table>
+<p>Multiple credential providers can be specified in a comma-separated list 
using the <code class="docutils literal notranslate"><span 
class="pre">fs.s3a.aws.credentials.provider</span></code> configuration, just 
as Hadoop AWS supports. If <code class="docutils literal notranslate"><span 
class="pre">fs.s3a.aws.credentials.provider</span></code> is not configured, 
Hadoop S3A’s default credential provider chain will be used. All configuration 
options also support bucket-specific overrides  [...]
+</section>
+<section id="additional-s3-configuration-options">
+<h4>Additional S3 Configuration Options<a class="headerlink" 
href="#additional-s3-configuration-options" title="Link to this 
heading">¶</a></h4>
+<p>Beyond credential providers, the <code class="docutils literal 
notranslate"><span class="pre">native_datafusion</span></code> implementation 
supports additional S3 configuration options:</p>
+<table class="table">
+<thead>
+<tr class="row-odd"><th class="head"><p>Option</p></th>
+<th class="head"><p>Description</p></th>
+</tr>
+</thead>
+<tbody>
+<tr class="row-even"><td><p><code class="docutils literal notranslate"><span 
class="pre">fs.s3a.endpoint</span></code></p></td>
+<td><p>The endpoint of the S3 service</p></td>
+</tr>
+<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span 
class="pre">fs.s3a.endpoint.region</span></code></p></td>
+<td><p>The AWS region for the S3 service. If not specified, the region will be 
auto-detected.</p></td>
+</tr>
+<tr class="row-even"><td><p><code class="docutils literal notranslate"><span 
class="pre">fs.s3a.path.style.access</span></code></p></td>
+<td><p>Whether to use path style access for the S3 service (true/false, 
defaults to virtual hosted style)</p></td>
+</tr>
+<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span 
class="pre">fs.s3a.requester.pays.enabled</span></code></p></td>
+<td><p>Whether to enable requester pays for S3 requests (true/false)</p></td>
+</tr>
+</tbody>
+</table>
+<p>All configuration options support bucket-specific overrides using the 
pattern <code class="docutils literal notranslate"><span 
class="pre">fs.s3a.bucket.{bucket-name}.{option}</span></code>.</p>
+</section>
+<section id="examples">
+<h4>Examples<a class="headerlink" href="#examples" title="Link to this 
heading">¶</a></h4>
+<p>The following examples demonstrate how to configure S3 access with the 
<code class="docutils literal notranslate"><span 
class="pre">native_datafusion</span></code> Parquet scan implementation using 
different authentication methods.</p>
+<p><strong>Example 1: Simple Credentials</strong></p>
+<p>This example shows how to access a private S3 bucket using an access key 
and secret key. The <code class="docutils literal notranslate"><span 
class="pre">fs.s3a.aws.credentials.provider</span></code> configuration can be 
omitted since <code class="docutils literal notranslate"><span 
class="pre">org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider</span></code> 
is included in Hadoop S3A’s default credential provider chain.</p>
+<div class="highlight-shell notranslate"><div 
class="highlight"><pre><span></span><span 
class="nv">$SPARK_HOME</span>/bin/spark-shell<span class="w"> </span><span 
class="se">\</span>
+...
+--conf<span class="w"> </span>spark.comet.scan.impl<span 
class="o">=</span>native_datafusion<span class="w"> </span><span 
class="se">\</span>
+--conf<span class="w"> </span>spark.hadoop.fs.s3a.access.key<span 
class="o">=</span>my-access-key<span class="w"> </span><span class="se">\</span>
+--conf<span class="w"> </span>spark.hadoop.fs.s3a.secret.key<span 
class="o">=</span>my-secret-key
+...
+</pre></div>
+</div>
+<p><strong>Example 2: Assume Role with Web Identity Token</strong></p>
+<p>This example demonstrates using an assumed role credential to access a 
private S3 bucket, where the base credential for assuming the role is provided 
by a web identity token credentials provider.</p>
+<div class="highlight-shell notranslate"><div 
class="highlight"><pre><span></span><span 
class="nv">$SPARK_HOME</span>/bin/spark-shell<span class="w"> </span><span 
class="se">\</span>
+...
+--conf<span class="w"> </span>spark.comet.scan.impl<span 
class="o">=</span>native_datafusion<span class="w"> </span><span 
class="se">\</span>
+--conf<span class="w"> 
</span>spark.hadoop.fs.s3a.aws.credentials.provider<span 
class="o">=</span>org.apache.hadoop.fs.s3a.auth.AssumedRoleCredentialProvider<span
 class="w"> </span><span class="se">\</span>
+--conf<span class="w"> </span>spark.hadoop.fs.s3a.assumed.role.arn<span 
class="o">=</span>arn:aws:iam::123456789012:role/my-role<span class="w"> 
</span><span class="se">\</span>
+--conf<span class="w"> 
</span>spark.hadoop.fs.s3a.assumed.role.session.name<span 
class="o">=</span>my-session<span class="w"> </span><span class="se">\</span>
+--conf<span class="w"> 
</span>spark.hadoop.fs.s3a.assumed.role.credentials.provider<span 
class="o">=</span>com.amazonaws.auth.WebIdentityTokenCredentialsProvider
+...
+</pre></div>
+</div>
+</section>
+<section id="limitations">
+<h4>Limitations<a class="headerlink" href="#limitations" title="Link to this 
heading">¶</a></h4>
+<p>The S3 support of <code class="docutils literal notranslate"><span 
class="pre">native_datafusion</span></code> has the following limitations:</p>
+<ol class="arabic simple">
+<li><p><strong>Partial Hadoop S3A configuration support</strong>: Not all 
Hadoop S3A configurations are currently supported. Only the configurations 
listed in the tables above are translated and applied to the underlying <code 
class="docutils literal notranslate"><span 
class="pre">object_store</span></code> crate.</p></li>
+<li><p><strong>Custom credential providers</strong>: Custom implementations of 
AWS credential providers are not supported. The implementation only supports 
the standard credential providers listed in the table above. We are planning to 
add support for custom credential providers through a JNI-based adapter that 
will allow calling Java credential providers from native code. See <a 
class="reference external" 
href="https://github.com/apache/datafusion-comet/issues/1829";>issue #1829</a> 
for  [...]
+</ol>
+</section>
+</section>
+<section id="native-iceberg-compat">
+<h3><code class="docutils literal notranslate"><span 
class="pre">native_iceberg_compat</span></code><a class="headerlink" 
href="#native-iceberg-compat" title="Link to this heading">¶</a></h3>
+<p>The <code class="docutils literal notranslate"><span 
class="pre">native_iceberg_compat</span></code> Parquet scan implementation 
does not support reading data from S3 yet, but we are working on it.</p>
+</section>
 </section>
 </section>
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org

(datafusion-comet) branch asf-site updated: Publish built docs triggered by 7cf2e9dc9f1cba4f172ea6bdc1a6ac23c859b4d7

Reply via email to