This is an automated email from the ASF dual-hosted git repository. github-bot pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/datafusion.git
The following commit(s) were added to refs/heads/asf-site by this push: new 8476e75d23 Publish built docs triggered by a91e0421ebadf3a155508e28e272f5fb8356bca1 8476e75d23 is described below commit 8476e75d23c67d0f3224a4272e579609158f04aa Author: github-actions[bot] <github-actions[bot]@users.noreply.github.com> AuthorDate: Wed Jun 11 08:49:35 2025 +0000 Publish built docs triggered by a91e0421ebadf3a155508e28e272f5fb8356bca1 --- _sources/user-guide/cli/datasources.md.txt | 87 +++++++++++++++++++++--------- searchindex.js | 2 +- user-guide/cli/datasources.html | 75 ++++++++++++++++++++------ 3 files changed, 123 insertions(+), 41 deletions(-) diff --git a/_sources/user-guide/cli/datasources.md.txt b/_sources/user-guide/cli/datasources.md.txt index 2e14f1f54c..afc4f6c0c5 100644 --- a/_sources/user-guide/cli/datasources.md.txt +++ b/_sources/user-guide/cli/datasources.md.txt @@ -82,22 +82,29 @@ select count(*) from 'https://datasets.clickhouse.com/hits_compatible/athena_par To read from an AWS S3 or GCS, use `s3` or `gs` as a protocol prefix. For example, to read a file in an S3 bucket named `my-data-bucket` use the URL `s3://my-data-bucket`and set the relevant access credentials as environmental -variables (e.g. for AWS S3 you need to at least `AWS_ACCESS_KEY_ID` and +variables (e.g. for AWS S3 you can use `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY`). ```sql -select count(*) from 's3://my-data-bucket/athena_partitioned/hits.parquet' +> select count(*) from 's3://altinity-clickhouse-data/nyc_taxi_rides/data/tripdata_parquet/'; ++------------+ +| count(*) | ++------------+ +| 1310903963 | ++------------+ ``` -See the [`CREATE EXTERNAL TABLE`](#create-external-table) section for +See the [`CREATE EXTERNAL TABLE`](#create-external-table) section below for additional configuration options. # `CREATE EXTERNAL TABLE` It is also possible to create a table backed by files or remote locations via -`CREATE EXTERNAL TABLE` as shown below. Note that DataFusion does not support wildcards (e.g. `*`) in file paths; instead, specify the directory path directly to read all compatible files in that directory. +`CREATE EXTERNAL TABLE` as shown below. Note that DataFusion does not support +wildcards (e.g. `*`) in file paths; instead, specify the directory path directly +to read all compatible files in that directory. -For example, to create a table `hits` backed by a local parquet file, use: +For example, to create a table `hits` backed by a local parquet file named `hits.parquet`: ```sql CREATE EXTERNAL TABLE hits @@ -105,7 +112,7 @@ STORED AS PARQUET LOCATION 'hits.parquet'; ``` -To create a table `hits` backed by a remote parquet file via HTTP(S), use +To create a table `hits` backed by a remote parquet file via HTTP(S): ```sql CREATE EXTERNAL TABLE hits @@ -127,7 +134,11 @@ select count(*) from hits; **Why Wildcards Are Not Supported** -Although wildcards (e.g., _.parquet or \*\*/_.parquet) may work for local filesystems in some cases, they are not officially supported by DataFusion. This is because wildcards are not universally applicable across all storage backends (e.g., S3, GCS). Instead, DataFusion expects the user to specify the directory path, and it will automatically read all compatible files within that directory. +Although wildcards (e.g., _.parquet or \*\*/_.parquet) may work for local +filesystems in some cases, they are not supported by DataFusion CLI. This +is because wildcards are not universally applicable across all storage backends +(e.g., S3, GCS). Instead, DataFusion expects the user to specify the directory +path, and it will automatically read all compatible files within that directory. For example, the following usage is not supported: @@ -148,7 +159,7 @@ CREATE EXTERNAL TABLE test ( day DATE ) STORED AS PARQUET -LOCATION 'gs://bucket/my_table'; +LOCATION 'gs://bucket/my_table/'; ``` # Formats @@ -168,6 +179,11 @@ LOCATION '/mnt/nyctaxi/tripdata.parquet'; Register a single folder parquet datasource. Note: All files inside must be valid parquet files and have compatible schemas +:::{note} +Paths must end in Slash `/` +: The path must end in `/` otherwise DataFusion will treat the path as a file and not a directory +::: + ```sql CREATE EXTERNAL TABLE taxi STORED AS PARQUET @@ -178,7 +194,7 @@ LOCATION '/mnt/nyctaxi/'; DataFusion will infer the CSV schema automatically or you can provide it explicitly. -Register a single file csv datasource with a header row. +Register a single file csv datasource with a header row: ```sql CREATE EXTERNAL TABLE test @@ -187,7 +203,7 @@ LOCATION '/path/to/aggregate_test_100.csv' OPTIONS ('has_header' 'true'); ``` -Register a single file csv datasource with explicitly defined schema. +Register a single file csv datasource with explicitly defined schema: ```sql CREATE EXTERNAL TABLE test ( @@ -213,7 +229,7 @@ LOCATION '/path/to/aggregate_test_100.csv'; ## HTTP(s) -To read from a remote parquet file via HTTP(S) you can use the following: +To read from a remote parquet file via HTTP(S): ```sql CREATE EXTERNAL TABLE hits @@ -223,9 +239,12 @@ LOCATION 'https://datasets.clickhouse.com/hits_compatible/athena_partitioned/hit ## S3 -[AWS S3](https://aws.amazon.com/s3/) data sources must have connection credentials configured. +DataFusion CLI supports configuring [AWS S3](https://aws.amazon.com/s3/) via the +`CREATE EXTERNAL TABLE` statement and standard AWS configuration methods (via the +[`aws-config`] AWS SDK crate). -To create an external table from a file in an S3 bucket: +To create an external table from a file in an S3 bucket with explicit +credentials: ```sql CREATE EXTERNAL TABLE test @@ -238,7 +257,7 @@ OPTIONS( LOCATION 's3://bucket/path/file.parquet'; ``` -It is also possible to specify the access information using environment variables: +To create an external table using environment variables: ```bash $ export AWS_DEFAULT_REGION=us-east-2 @@ -247,7 +266,7 @@ $ export AWS_ACCESS_KEY_ID=****** $ datafusion-cli `datafusion-cli v21.0.0 -> create external table test stored as parquet location 's3://bucket/path/file.parquet'; +> create CREATE TABLE test STORED AS PARQUET LOCATION 's3://bucket/path/file.parquet'; 0 rows in set. Query took 0.374 seconds. > select * from test; +----------+----------+ @@ -258,19 +277,39 @@ $ datafusion-cli 1 row in set. Query took 0.171 seconds. ``` +To read from a public S3 bucket without signatures, use the +`aws.SKIP_SIGNATURE` option: + +```sql +CREATE EXTERNAL TABLE nyc_taxi_rides +STORED AS PARQUET LOCATION 's3://altinity-clickhouse-data/nyc_taxi_rides/data/tripdata_parquet/' +OPTIONS(aws.SKIP_SIGNATURE true); +``` + +Credentials are taken in this order of precedence: + +1. Explicitly specified in the `OPTIONS` clause of the `CREATE EXTERNAL TABLE` statement. +2. Determined by [`aws-config`] crate (standard environment variables such as `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` as well as other AWS specific features). + +If no credentials are specified, DataFusion CLI will use unsigned requests to S3, +which allows reading from public buckets. + Supported configuration options are: -| Environment Variable | Configuration Option | Description | -| ---------------------------------------- | ----------------------- | ---------------------------------------------------- | -| `AWS_ACCESS_KEY_ID` | `aws.access_key_id` | | -| `AWS_SECRET_ACCESS_KEY` | `aws.secret_access_key` | | -| `AWS_DEFAULT_REGION` | `aws.region` | | -| `AWS_ENDPOINT` | `aws.endpoint` | | -| `AWS_SESSION_TOKEN` | `aws.token` | | -| `AWS_CONTAINER_CREDENTIALS_RELATIVE_URI` | | See [IAM Roles] | -| `AWS_ALLOW_HTTP` | | set to "true" to permit HTTP connections without TLS | +| Environment Variable | Configuration Option | Description | +| ---------------------------------------- | ----------------------- | ---------------------------------------------- | +| `AWS_ACCESS_KEY_ID` | `aws.access_key_id` | | +| `AWS_SECRET_ACCESS_KEY` | `aws.secret_access_key` | | +| `AWS_DEFAULT_REGION` | `aws.region` | | +| `AWS_ENDPOINT` | `aws.endpoint` | | +| `AWS_SESSION_TOKEN` | `aws.token` | | +| `AWS_CONTAINER_CREDENTIALS_RELATIVE_URI` | | See [IAM Roles] | +| `AWS_ALLOW_HTTP` | | If "true", permit HTTP connections without TLS | +| `AWS_SKIP_SIGNATURE` | `aws.skip_signature` | If "true", does not sign requests | +| | `aws.nosign` | Alias for `skip_signature` | [iam roles]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-iam-roles.html +[`aws-config`]: https://docs.rs/aws-config/latest/aws_config/ ## OSS diff --git a/searchindex.js b/searchindex.js index 0936b2408d..0ef662758d 100644 --- a/searchindex.js +++ b/searchindex.js @@ -1 +1 @@ -Search.setIndex({"alltitles":{"!=":[[54,"op-neq"]],"!~":[[54,"op-re-not-match"]],"!~*":[[54,"op-re-not-match-i"]],"!~~":[[54,"id19"]],"!~~*":[[54,"id20"]],"#":[[54,"op-bit-xor"]],"%":[[54,"op-modulo"]],"&":[[54,"op-bit-and"]],"(relation, name) tuples in logical fields and logical columns are unique":[[12,"relation-name-tuples-in-logical-fields-and-logical-columns-are-unique"]],"*":[[54,"op-multiply"]],"+":[[54,"op-plus"]],"-":[[54,"op-minus"]],"/":[[54,"op-divide"]],"2022 Q2":[[10,"q2"]] [...] \ No newline at end of file +Search.setIndex({"alltitles":{"!=":[[54,"op-neq"]],"!~":[[54,"op-re-not-match"]],"!~*":[[54,"op-re-not-match-i"]],"!~~":[[54,"id19"]],"!~~*":[[54,"id20"]],"#":[[54,"op-bit-xor"]],"%":[[54,"op-modulo"]],"&":[[54,"op-bit-and"]],"(relation, name) tuples in logical fields and logical columns are unique":[[12,"relation-name-tuples-in-logical-fields-and-logical-columns-are-unique"]],"*":[[54,"op-multiply"]],"+":[[54,"op-plus"]],"-":[[54,"op-minus"]],"/":[[54,"op-divide"]],"2022 Q2":[[10,"q2"]] [...] \ No newline at end of file diff --git a/user-guide/cli/datasources.html b/user-guide/cli/datasources.html index eb18590b8a..a9ea511ff3 100644 --- a/user-guide/cli/datasources.html +++ b/user-guide/cli/datasources.html @@ -703,25 +703,32 @@ For example, to read from a remote parquet file via HTTP(S) you can use the foll <p>To read from an AWS S3 or GCS, use <code class="docutils literal notranslate"><span class="pre">s3</span></code> or <code class="docutils literal notranslate"><span class="pre">gs</span></code> as a protocol prefix. For example, to read a file in an S3 bucket named <code class="docutils literal notranslate"><span class="pre">my-data-bucket</span></code> use the URL <code class="docutils literal notranslate"><span class="pre">s3://my-data-bucket</span></code>and set the relevant access credentials as environmental -variables (e.g. for AWS S3 you need to at least <code class="docutils literal notranslate"><span class="pre">AWS_ACCESS_KEY_ID</span></code> and +variables (e.g. for AWS S3 you can use <code class="docutils literal notranslate"><span class="pre">AWS_ACCESS_KEY_ID</span></code> and <code class="docutils literal notranslate"><span class="pre">AWS_SECRET_ACCESS_KEY</span></code>).</p> -<div class="highlight-sql notranslate"><div class="highlight"><pre><span></span><span class="k">select</span><span class="w"> </span><span class="k">count</span><span class="p">(</span><span class="o">*</span><span class="p">)</span><span class="w"> </span><span class="k">from</span><span class="w"> </span><span class="s1">'s3://my-data-bucket/athena_partitioned/hits.parquet'</span> +<div class="highlight-sql notranslate"><div class="highlight"><pre><span></span><span class="o">></span><span class="w"> </span><span class="k">select</span><span class="w"> </span><span class="k">count</span><span class="p">(</span><span class="o">*</span><span class="p">)</span><span class="w"> </span><span class="k">from</span><span class="w"> </span><span class="s1">'s3://altinity-clickhouse-data/nyc_taxi_rides/data/tripdata_parquet/'</span><span class="p">;</span> +<span class="o">+</span><span class="c1">------------+</span> +<span class="o">|</span><span class="w"> </span><span class="k">count</span><span class="p">(</span><span class="o">*</span><span class="p">)</span><span class="w"> </span><span class="o">|</span> +<span class="o">+</span><span class="c1">------------+</span> +<span class="o">|</span><span class="w"> </span><span class="mi">1310903963</span><span class="w"> </span><span class="o">|</span> +<span class="o">+</span><span class="c1">------------+</span> </pre></div> </div> -<p>See the <a class="reference internal" href="#create-external-table"><code class="docutils literal notranslate"><span class="pre">CREATE</span> <span class="pre">EXTERNAL</span> <span class="pre">TABLE</span></code></a> section for +<p>See the <a class="reference internal" href="#create-external-table"><code class="docutils literal notranslate"><span class="pre">CREATE</span> <span class="pre">EXTERNAL</span> <span class="pre">TABLE</span></code></a> section below for additional configuration options.</p> </section> <section id="create-external-table"> <h1><code class="docutils literal notranslate"><span class="pre">CREATE</span> <span class="pre">EXTERNAL</span> <span class="pre">TABLE</span></code><a class="headerlink" href="#create-external-table" title="Link to this heading">¶</a></h1> <p>It is also possible to create a table backed by files or remote locations via -<code class="docutils literal notranslate"><span class="pre">CREATE</span> <span class="pre">EXTERNAL</span> <span class="pre">TABLE</span></code> as shown below. Note that DataFusion does not support wildcards (e.g. <code class="docutils literal notranslate"><span class="pre">*</span></code>) in file paths; instead, specify the directory path directly to read all compatible files in that directory.</p> -<p>For example, to create a table <code class="docutils literal notranslate"><span class="pre">hits</span></code> backed by a local parquet file, use:</p> +<code class="docutils literal notranslate"><span class="pre">CREATE</span> <span class="pre">EXTERNAL</span> <span class="pre">TABLE</span></code> as shown below. Note that DataFusion does not support +wildcards (e.g. <code class="docutils literal notranslate"><span class="pre">*</span></code>) in file paths; instead, specify the directory path directly +to read all compatible files in that directory.</p> +<p>For example, to create a table <code class="docutils literal notranslate"><span class="pre">hits</span></code> backed by a local parquet file named <code class="docutils literal notranslate"><span class="pre">hits.parquet</span></code>:</p> <div class="highlight-sql notranslate"><div class="highlight"><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="k">EXTERNAL</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">hits</span> <span class="n">STORED</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">PARQUET</span> <span class="k">LOCATION</span><span class="w"> </span><span class="s1">'hits.parquet'</span><span class="p">;</span> </pre></div> </div> -<p>To create a table <code class="docutils literal notranslate"><span class="pre">hits</span></code> backed by a remote parquet file via HTTP(S), use</p> +<p>To create a table <code class="docutils literal notranslate"><span class="pre">hits</span></code> backed by a remote parquet file via HTTP(S):</p> <div class="highlight-sql notranslate"><div class="highlight"><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="k">EXTERNAL</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">hits</span> <span class="n">STORED</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">PARQUET</span> <span class="k">LOCATION</span><span class="w"> </span><span class="s1">'https://datasets.clickhouse.com/hits_compatible/athena_partitioned/hits_1.parquet'</span><span class="p">;</span> @@ -738,7 +745,11 @@ additional configuration options.</p> </pre></div> </div> <p><strong>Why Wildcards Are Not Supported</strong></p> -<p>Although wildcards (e.g., <em>.parquet or **/</em>.parquet) may work for local filesystems in some cases, they are not officially supported by DataFusion. This is because wildcards are not universally applicable across all storage backends (e.g., S3, GCS). Instead, DataFusion expects the user to specify the directory path, and it will automatically read all compatible files within that directory.</p> +<p>Although wildcards (e.g., <em>.parquet or **/</em>.parquet) may work for local +filesystems in some cases, they are not supported by DataFusion CLI. This +is because wildcards are not universally applicable across all storage backends +(e.g., S3, GCS). Instead, DataFusion expects the user to specify the directory +path, and it will automatically read all compatible files within that directory.</p> <p>For example, the following usage is not supported:</p> <div class="highlight-sql notranslate"><div class="highlight"><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="k">EXTERNAL</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">test</span><span class="w"> </span><span class="p">(</span> <span class="w"> </span><span class="n">message</span><span class="w"> </span><span class="nb">TEXT</span><span class="p">,</span> @@ -754,7 +765,7 @@ additional configuration options.</p> <span class="w"> </span><span class="k">day</span><span class="w"> </span><span class="nb">DATE</span> <span class="p">)</span> <span class="n">STORED</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">PARQUET</span> -<span class="k">LOCATION</span><span class="w"> </span><span class="s1">'gs://bucket/my_table'</span><span class="p">;</span> +<span class="k">LOCATION</span><span class="w"> </span><span class="s1">'gs://bucket/my_table/'</span><span class="p">;</span> </pre></div> </div> </section> @@ -771,6 +782,13 @@ additional configuration options.</p> </div> <p>Register a single folder parquet datasource. Note: All files inside must be valid parquet files and have compatible schemas</p> +<div class="admonition note"> +<p class="admonition-title">Note</p> +<dl class="simple myst"> +<dt>Paths must end in Slash <code class="docutils literal notranslate"><span class="pre">/</span></code></dt><dd><p>The path must end in <code class="docutils literal notranslate"><span class="pre">/</span></code> otherwise DataFusion will treat the path as a file and not a directory</p> +</dd> +</dl> +</div> <div class="highlight-sql notranslate"><div class="highlight"><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="k">EXTERNAL</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">taxi</span> <span class="n">STORED</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">PARQUET</span> <span class="k">LOCATION</span><span class="w"> </span><span class="s1">'/mnt/nyctaxi/'</span><span class="p">;</span> @@ -780,14 +798,14 @@ parquet files and have compatible schemas</p> <section id="csv"> <h2>CSV<a class="headerlink" href="#csv" title="Link to this heading">¶</a></h2> <p>DataFusion will infer the CSV schema automatically or you can provide it explicitly.</p> -<p>Register a single file csv datasource with a header row.</p> +<p>Register a single file csv datasource with a header row:</p> <div class="highlight-sql notranslate"><div class="highlight"><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="k">EXTERNAL</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">test</span> <span class="n">STORED</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">CSV</span> <span class="k">LOCATION</span><span class="w"> </span><span class="s1">'/path/to/aggregate_test_100.csv'</span> <span class="k">OPTIONS</span><span class="w"> </span><span class="p">(</span><span class="s1">'has_header'</span><span class="w"> </span><span class="s1">'true'</span><span class="p">);</span> </pre></div> </div> -<p>Register a single file csv datasource with explicitly defined schema.</p> +<p>Register a single file csv datasource with explicitly defined schema:</p> <div class="highlight-sql notranslate"><div class="highlight"><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="k">EXTERNAL</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">test</span><span class="w"> </span><span class="p">(</span> <span class="w"> </span><span class="n">c1</span><span class="w"> </span><span class="nb">VARCHAR</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">NULL</span><span class="p">,</span> <span class="w"> </span><span class="n">c2</span><span class="w"> </span><span class="nb">INT</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">NULL</span><span class="p">,</span> @@ -813,7 +831,7 @@ parquet files and have compatible schemas</p> <h1>Locations<a class="headerlink" href="#locations" title="Link to this heading">¶</a></h1> <section id="http-s"> <h2>HTTP(s)<a class="headerlink" href="#http-s" title="Link to this heading">¶</a></h2> -<p>To read from a remote parquet file via HTTP(S) you can use the following:</p> +<p>To read from a remote parquet file via HTTP(S):</p> <div class="highlight-sql notranslate"><div class="highlight"><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="k">EXTERNAL</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">hits</span> <span class="n">STORED</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">PARQUET</span> <span class="k">LOCATION</span><span class="w"> </span><span class="s1">'https://datasets.clickhouse.com/hits_compatible/athena_partitioned/hits_1.parquet'</span><span class="p">;</span> @@ -822,8 +840,11 @@ parquet files and have compatible schemas</p> </section> <section id="s3"> <h2>S3<a class="headerlink" href="#s3" title="Link to this heading">¶</a></h2> -<p><a class="reference external" href="https://aws.amazon.com/s3/">AWS S3</a> data sources must have connection credentials configured.</p> -<p>To create an external table from a file in an S3 bucket:</p> +<p>DataFusion CLI supports configuring <a class="reference external" href="https://aws.amazon.com/s3/">AWS S3</a> via the +<code class="docutils literal notranslate"><span class="pre">CREATE</span> <span class="pre">EXTERNAL</span> <span class="pre">TABLE</span></code> statement and standard AWS configuration methods (via the +<a class="reference external" href="https://docs.rs/aws-config/latest/aws_config/"><code class="docutils literal notranslate"><span class="pre">aws-config</span></code></a> AWS SDK crate).</p> +<p>To create an external table from a file in an S3 bucket with explicit +credentials:</p> <div class="highlight-sql notranslate"><div class="highlight"><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="k">EXTERNAL</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">test</span> <span class="n">STORED</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">PARQUET</span> <span class="k">OPTIONS</span><span class="p">(</span> @@ -834,14 +855,14 @@ parquet files and have compatible schemas</p> <span class="k">LOCATION</span><span class="w"> </span><span class="s1">'s3://bucket/path/file.parquet'</span><span class="p">;</span> </pre></div> </div> -<p>It is also possible to specify the access information using environment variables:</p> +<p>To create an external table using environment variables:</p> <div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">AWS_DEFAULT_REGION</span><span class="o">=</span>us-east-2 $<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">AWS_SECRET_ACCESS_KEY</span><span class="o">=</span>****** $<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">AWS_ACCESS_KEY_ID</span><span class="o">=</span>****** $<span class="w"> </span>datafusion-cli <span class="sb">`</span>datafusion-cli<span class="w"> </span>v21.0.0 -><span class="w"> </span>create<span class="w"> </span>external<span class="w"> </span>table<span class="w"> </span><span class="nb">test</span><span class="w"> </span>stored<span class="w"> </span>as<span class="w"> </span>parquet<span class="w"> </span>location<span class="w"> </span><span class="s1">'s3://bucket/path/file.parquet'</span><span class="p">;</span> +><span class="w"> </span>create<span class="w"> </span>CREATE<span class="w"> </span>TABLE<span class="w"> </span><span class="nb">test</span><span class="w"> </span>STORED<span class="w"> </span>AS<span class="w"> </span>PARQUET<span class="w"> </span>LOCATION<span class="w"> </span><span class="s1">'s3://bucket/path/file.parquet'</span><span class="p">;</span> <span class="m">0</span><span class="w"> </span>rows<span class="w"> </span><span class="k">in</span><span class="w"> </span>set.<span class="w"> </span>Query<span class="w"> </span>took<span class="w"> </span><span class="m">0</span>.374<span class="w"> </span>seconds. ><span class="w"> </span><span class="k">select</span><span class="w"> </span>*<span class="w"> </span>from<span class="w"> </span>test<span class="p">;</span> +----------+----------+ @@ -852,6 +873,20 @@ $<span class="w"> </span>datafusion-cli <span class="m">1</span><span class="w"> </span>row<span class="w"> </span><span class="k">in</span><span class="w"> </span>set.<span class="w"> </span>Query<span class="w"> </span>took<span class="w"> </span><span class="m">0</span>.171<span class="w"> </span>seconds. </pre></div> </div> +<p>To read from a public S3 bucket without signatures, use the +<code class="docutils literal notranslate"><span class="pre">aws.SKIP_SIGNATURE</span></code> option:</p> +<div class="highlight-sql notranslate"><div class="highlight"><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="k">EXTERNAL</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">nyc_taxi_rides</span> +<span class="n">STORED</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">PARQUET</span><span class="w"> </span><span class="k">LOCATION</span><span class="w"> </span><span class="s1">'s3://altinity-clickhouse-data/nyc_taxi_rides/data/tripdata_parquet/'</span> +<span class="k">OPTIONS</span><span class="p">(</span><span class="n">aws</span><span class="p">.</span><span class="n">SKIP_SIGNATURE</span><span class="w"> </span><span class="k">true</span><span class="p">);</span> +</pre></div> +</div> +<p>Credentials are taken in this order of precedence:</p> +<ol class="arabic simple"> +<li><p>Explicitly specified in the <code class="docutils literal notranslate"><span class="pre">OPTIONS</span></code> clause of the <code class="docutils literal notranslate"><span class="pre">CREATE</span> <span class="pre">EXTERNAL</span> <span class="pre">TABLE</span></code> statement.</p></li> +<li><p>Determined by <a class="reference external" href="https://docs.rs/aws-config/latest/aws_config/"><code class="docutils literal notranslate"><span class="pre">aws-config</span></code></a> crate (standard environment variables such as <code class="docutils literal notranslate"><span class="pre">AWS_ACCESS_KEY_ID</span></code> and <code class="docutils literal notranslate"><span class="pre">AWS_SECRET_ACCESS_KEY</span></code> as well as other AWS specific features).</p></li> +</ol> +<p>If no credentials are specified, DataFusion CLI will use unsigned requests to S3, +which allows reading from public buckets.</p> <p>Supported configuration options are:</p> <table class="table"> <thead> @@ -887,7 +922,15 @@ $<span class="w"> </span>datafusion-cli </tr> <tr class="row-even"><td><p><code class="docutils literal notranslate"><span class="pre">AWS_ALLOW_HTTP</span></code></p></td> <td><p></p></td> -<td><p>set to “true” to permit HTTP connections without TLS</p></td> +<td><p>If “true”, permit HTTP connections without TLS</p></td> +</tr> +<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span class="pre">AWS_SKIP_SIGNATURE</span></code></p></td> +<td><p><code class="docutils literal notranslate"><span class="pre">aws.skip_signature</span></code></p></td> +<td><p>If “true”, does not sign requests</p></td> +</tr> +<tr class="row-even"><td><p></p></td> +<td><p><code class="docutils literal notranslate"><span class="pre">aws.nosign</span></code></p></td> +<td><p>Alias for <code class="docutils literal notranslate"><span class="pre">skip_signature</span></code></p></td> </tr> </tbody> </table> --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org For additional commands, e-mail: commits-h...@datafusion.apache.org