This is an automated email from the ASF dual-hosted git repository.
github-bot pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git
The following commit(s) were added to refs/heads/asf-site by this push:
new f364af264c Publish built docs triggered by
c6fa2659818ca27e854bbd0cf6960e1b1906e0af
f364af264c is described below
commit f364af264c2a0687eb4b60eedc87cd6ab5f5683a
Author: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
AuthorDate: Mon May 8 15:26:20 2023 +0000
Publish built docs triggered by c6fa2659818ca27e854bbd0cf6960e1b1906e0af
---
_sources/user-guide/sql/ddl.md.txt | 58 +++++++++++++++++++++++++++++++++-----
searchindex.js | 2 +-
user-guide/sql/ddl.html | 51 ++++++++++++++++++++++++++++-----
3 files changed, 96 insertions(+), 15 deletions(-)
diff --git a/_sources/user-guide/sql/ddl.md.txt
b/_sources/user-guide/sql/ddl.md.txt
index 6c8fbcab68..0dcc4517b5 100644
--- a/_sources/user-guide/sql/ddl.md.txt
+++ b/_sources/user-guide/sql/ddl.md.txt
@@ -47,8 +47,41 @@ CREATE SCHEMA cat.emu;
## CREATE EXTERNAL TABLE
-Parquet data sources can be registered by executing a `CREATE EXTERNAL TABLE`
SQL statement. It is not necessary
-to provide schema information for Parquet files.
+`CREATE EXTERNAL TABLE` SQL statement registers a location on a local
+file system or remote object store as a named table which can be queried.
+
+The supported syntax is:
+
+```
+CREATE EXTERNAL TABLE
+[ IF NOT EXISTS ]
+<TABLE_NAME>[ (<column_definition>) ]
+STORED AS <file_type>
+[ WITH HEADER ROW ]
+[ DELIMITER <char> ]
+[ COMPRESSION TYPE <GZIP | BZIP2 | XZ | ZSTD> ]
+[ PARTITIONED BY (<column list>) ]
+[ WITH ORDER (<ordered column list>)
+[ OPTIONS (<key_value_list>) ]
+LOCATION <literal>
+
+<column_definition> := (<column_name> <data_type>, ...)
+
+<column_list> := (<column_name>, ...)
+
+<ordered_column_list> := (<column_name> <sort_clause>, ...)
+
+<key_value_list> := (<literal> <literal, <literal> <literal>, ...)
+```
+
+`file_type` is one of `CSV`, `PARQUET`, `AVRO` or `JSON`
+
+`LOCATION <literal>` specfies the location to find the data. It can be
+a path to a file or directory of partitioned files locally or on an
+object store.
+
+Parquet data sources can be registered by executing a `CREATE EXTERNAL TABLE`
SQL statement such as the following. It is not necessary to
+provide schema information for Parquet files.
```sql
CREATE EXTERNAL TABLE taxi
@@ -56,8 +89,8 @@ STORED AS PARQUET
LOCATION '/mnt/nyctaxi/tripdata.parquet';
```
-CSV data sources can also be registered by executing a `CREATE EXTERNAL TABLE`
SQL statement. The schema will be
-inferred based on scanning a subset of the file.
+CSV data sources can also be registered by executing a `CREATE EXTERNAL TABLE`
SQL statement. The schema will be inferred based on
+scanning a subset of the file.
```sql
CREATE EXTERNAL TABLE test
@@ -89,9 +122,20 @@ WITH HEADER ROW
LOCATION '/path/to/aggregate_test_100.csv';
```
-When creating an output from a data source that is already ordered by an
expression, you can pre-specify the order of
-the data using the `WITH ORDER` clause. This applies even if the expression
used for sorting is complex,
-allowing for greater flexibility.
+It is also possible to specify a directory that contains a partitioned
+table (multiple files with the same schema)
+
+```sql
+CREATE EXTERNAL TABLE test
+STORED AS CSV
+WITH HEADER ROW
+LOCATION '/path/to/directory/of/files';
+```
+
+When creating an output from a data source that is already ordered by
+an expression, you can pre-specify the order of the data using the
+`WITH ORDER` clause. This applies even if the expression used for
+sorting is complex, allowing for greater flexibility.
Here's an example of how to use `WITH ORDER` clause.
diff --git a/searchindex.js b/searchindex.js
index 4a4ffccef4..5f85e3f676 100644
--- a/searchindex.js
+++ b/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"docnames": ["contributor-guide/architecture",
"contributor-guide/communication", "contributor-guide/index",
"contributor-guide/quarterly_roadmap", "contributor-guide/roadmap",
"contributor-guide/specification/index",
"contributor-guide/specification/invariants",
"contributor-guide/specification/output-field-name-semantic", "index",
"user-guide/cli", "user-guide/configs", "user-guide/dataframe",
"user-guide/example-usage", "user-guide/expressions", "user-guide/faq", "use
[...]
\ No newline at end of file
+Search.setIndex({"docnames": ["contributor-guide/architecture",
"contributor-guide/communication", "contributor-guide/index",
"contributor-guide/quarterly_roadmap", "contributor-guide/roadmap",
"contributor-guide/specification/index",
"contributor-guide/specification/invariants",
"contributor-guide/specification/output-field-name-semantic", "index",
"user-guide/cli", "user-guide/configs", "user-guide/dataframe",
"user-guide/example-usage", "user-guide/expressions", "user-guide/faq", "use
[...]
\ No newline at end of file
diff --git a/user-guide/sql/ddl.html b/user-guide/sql/ddl.html
index afc67d384c..e2418a5021 100644
--- a/user-guide/sql/ddl.html
+++ b/user-guide/sql/ddl.html
@@ -392,15 +392,43 @@ CREATE SCHEMA [ IF NOT EXISTS ] [ <i><b>catalog.</i></b>
] <b><i>schema_name</i>
</section>
<section id="create-external-table">
<h2>CREATE EXTERNAL TABLE<a class="headerlink" href="#create-external-table"
title="Permalink to this heading">¶</a></h2>
-<p>Parquet data sources can be registered by executing a <code class="docutils
literal notranslate"><span class="pre">CREATE</span> <span
class="pre">EXTERNAL</span> <span class="pre">TABLE</span></code> SQL
statement. It is not necessary
-to provide schema information for Parquet files.</p>
+<p><code class="docutils literal notranslate"><span class="pre">CREATE</span>
<span class="pre">EXTERNAL</span> <span class="pre">TABLE</span></code> SQL
statement registers a location on a local
+file system or remote object store as a named table which can be queried.</p>
+<p>The supported syntax is:</p>
+<div class="highlight-default notranslate"><div
class="highlight"><pre><span></span><span class="n">CREATE</span> <span
class="n">EXTERNAL</span> <span class="n">TABLE</span>
+<span class="p">[</span> <span class="n">IF</span> <span class="n">NOT</span>
<span class="n">EXISTS</span> <span class="p">]</span>
+<span class="o"><</span><span class="n">TABLE_NAME</span><span
class="o">></span><span class="p">[</span> <span class="p">(</span><span
class="o"><</span><span class="n">column_definition</span><span
class="o">></span><span class="p">)</span> <span class="p">]</span>
+<span class="n">STORED</span> <span class="n">AS</span> <span
class="o"><</span><span class="n">file_type</span><span class="o">></span>
+<span class="p">[</span> <span class="n">WITH</span> <span
class="n">HEADER</span> <span class="n">ROW</span> <span class="p">]</span>
+<span class="p">[</span> <span class="n">DELIMITER</span> <span
class="o"><</span><span class="n">char</span><span class="o">></span>
<span class="p">]</span>
+<span class="p">[</span> <span class="n">COMPRESSION</span> <span
class="n">TYPE</span> <span class="o"><</span><span class="n">GZIP</span>
<span class="o">|</span> <span class="n">BZIP2</span> <span class="o">|</span>
<span class="n">XZ</span> <span class="o">|</span> <span
class="n">ZSTD</span><span class="o">></span> <span class="p">]</span>
+<span class="p">[</span> <span class="n">PARTITIONED</span> <span
class="n">BY</span> <span class="p">(</span><span class="o"><</span><span
class="n">column</span> <span class="nb">list</span><span
class="o">></span><span class="p">)</span> <span class="p">]</span>
+<span class="p">[</span> <span class="n">WITH</span> <span
class="n">ORDER</span> <span class="p">(</span><span class="o"><</span><span
class="n">ordered</span> <span class="n">column</span> <span
class="nb">list</span><span class="o">></span><span class="p">)</span>
+<span class="p">[</span> <span class="n">OPTIONS</span> <span
class="p">(</span><span class="o"><</span><span
class="n">key_value_list</span><span class="o">></span><span
class="p">)</span> <span class="p">]</span>
+<span class="n">LOCATION</span> <span class="o"><</span><span
class="n">literal</span><span class="o">></span>
+
+<span class="o"><</span><span class="n">column_definition</span><span
class="o">></span> <span class="o">:=</span> <span class="p">(</span><span
class="o"><</span><span class="n">column_name</span><span
class="o">></span> <span class="o"><</span><span
class="n">data_type</span><span class="o">></span><span class="p">,</span>
<span class="o">...</span><span class="p">)</span>
+
+<span class="o"><</span><span class="n">column_list</span><span
class="o">></span> <span class="o">:=</span> <span class="p">(</span><span
class="o"><</span><span class="n">column_name</span><span
class="o">></span><span class="p">,</span> <span class="o">...</span><span
class="p">)</span>
+
+<span class="o"><</span><span class="n">ordered_column_list</span><span
class="o">></span> <span class="o">:=</span> <span class="p">(</span><span
class="o"><</span><span class="n">column_name</span><span
class="o">></span> <span class="o"><</span><span
class="n">sort_clause</span><span class="o">></span><span class="p">,</span>
<span class="o">...</span><span class="p">)</span>
+
+<span class="o"><</span><span class="n">key_value_list</span><span
class="o">></span> <span class="o">:=</span> <span class="p">(</span><span
class="o"><</span><span class="n">literal</span><span class="o">></span>
<span class="o"><</span><span class="n">literal</span><span
class="p">,</span> <span class="o"><</span><span
class="n">literal</span><span class="o">></span> <span
class="o"><</span><span class="n">literal</span><span
class="o">></span><span class="p [...]
+</pre></div>
+</div>
+<p><code class="docutils literal notranslate"><span
class="pre">file_type</span></code> is one of <code class="docutils literal
notranslate"><span class="pre">CSV</span></code>, <code class="docutils literal
notranslate"><span class="pre">PARQUET</span></code>, <code class="docutils
literal notranslate"><span class="pre">AVRO</span></code> or <code
class="docutils literal notranslate"><span class="pre">JSON</span></code></p>
+<p><code class="docutils literal notranslate"><span
class="pre">LOCATION</span> <span class="pre"><literal></span></code>
specfies the location to find the data. It can be
+a path to a file or directory of partitioned files locally or on an
+object store.</p>
+<p>Parquet data sources can be registered by executing a <code class="docutils
literal notranslate"><span class="pre">CREATE</span> <span
class="pre">EXTERNAL</span> <span class="pre">TABLE</span></code> SQL statement
such as the following. It is not necessary to
+provide schema information for Parquet files.</p>
<div class="highlight-sql notranslate"><div
class="highlight"><pre><span></span><span class="k">CREATE</span><span
class="w"> </span><span class="k">EXTERNAL</span><span class="w"> </span><span
class="k">TABLE</span><span class="w"> </span><span class="n">taxi</span>
<span class="n">STORED</span><span class="w"> </span><span
class="k">AS</span><span class="w"> </span><span class="n">PARQUET</span>
<span class="k">LOCATION</span><span class="w"> </span><span
class="s1">'/mnt/nyctaxi/tripdata.parquet'</span><span
class="p">;</span>
</pre></div>
</div>
-<p>CSV data sources can also be registered by executing a <code
class="docutils literal notranslate"><span class="pre">CREATE</span> <span
class="pre">EXTERNAL</span> <span class="pre">TABLE</span></code> SQL
statement. The schema will be
-inferred based on scanning a subset of the file.</p>
+<p>CSV data sources can also be registered by executing a <code
class="docutils literal notranslate"><span class="pre">CREATE</span> <span
class="pre">EXTERNAL</span> <span class="pre">TABLE</span></code> SQL
statement. The schema will be inferred based on
+scanning a subset of the file.</p>
<div class="highlight-sql notranslate"><div
class="highlight"><pre><span></span><span class="k">CREATE</span><span
class="w"> </span><span class="k">EXTERNAL</span><span class="w"> </span><span
class="k">TABLE</span><span class="w"> </span><span class="n">test</span>
<span class="n">STORED</span><span class="w"> </span><span
class="k">AS</span><span class="w"> </span><span class="n">CSV</span>
<span class="k">WITH</span><span class="w"> </span><span
class="n">HEADER</span><span class="w"> </span><span class="k">ROW</span>
@@ -428,9 +456,18 @@ inferred based on scanning a subset of the file.</p>
<span class="k">LOCATION</span><span class="w"> </span><span
class="s1">'/path/to/aggregate_test_100.csv'</span><span
class="p">;</span>
</pre></div>
</div>
-<p>When creating an output from a data source that is already ordered by an
expression, you can pre-specify the order of
-the data using the <code class="docutils literal notranslate"><span
class="pre">WITH</span> <span class="pre">ORDER</span></code> clause. This
applies even if the expression used for sorting is complex,
-allowing for greater flexibility.</p>
+<p>It is also possible to specify a directory that contains a partitioned
+table (multiple files with the same schema)</p>
+<div class="highlight-sql notranslate"><div
class="highlight"><pre><span></span><span class="k">CREATE</span><span
class="w"> </span><span class="k">EXTERNAL</span><span class="w"> </span><span
class="k">TABLE</span><span class="w"> </span><span class="n">test</span>
+<span class="n">STORED</span><span class="w"> </span><span
class="k">AS</span><span class="w"> </span><span class="n">CSV</span>
+<span class="k">WITH</span><span class="w"> </span><span
class="n">HEADER</span><span class="w"> </span><span class="k">ROW</span>
+<span class="k">LOCATION</span><span class="w"> </span><span
class="s1">'/path/to/directory/of/files'</span><span class="p">;</span>
+</pre></div>
+</div>
+<p>When creating an output from a data source that is already ordered by
+an expression, you can pre-specify the order of the data using the
+<code class="docutils literal notranslate"><span class="pre">WITH</span> <span
class="pre">ORDER</span></code> clause. This applies even if the expression
used for
+sorting is complex, allowing for greater flexibility.</p>
<p>Here’s an example of how to use <code class="docutils literal
notranslate"><span class="pre">WITH</span> <span
class="pre">ORDER</span></code> clause.</p>
<div class="highlight-sql notranslate"><div
class="highlight"><pre><span></span><span class="k">CREATE</span><span
class="w"> </span><span class="k">EXTERNAL</span><span class="w"> </span><span
class="k">TABLE</span><span class="w"> </span><span class="n">test</span><span
class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="n">c1</span><span class="w">
</span><span class="nb">VARCHAR</span><span class="w"> </span><span
class="k">NOT</span><span class="w"> </span><span class="k">NULL</span><span
class="p">,</span>