This is an automated email from the ASF dual-hosted git repository.
github-bot pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git
The following commit(s) were added to refs/heads/asf-site by this push:
new 8c020b6eb5 Publish built docs triggered by
85b4e40df9e9a5a71c08760452c2059a271313d1
8c020b6eb5 is described below
commit 8c020b6eb5a1180325abce4c363136084bcd77ea
Author: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
AuthorDate: Sun Apr 7 17:40:15 2024 +0000
Publish built docs triggered by 85b4e40df9e9a5a71c08760452c2059a271313d1
---
_sources/user-guide/sql/dml.md.txt | 39 ++++++++++++++---
_sources/user-guide/sql/write_options.md.txt | 45 ++++++++++----------
searchindex.js | 2 +-
user-guide/sql/dml.html | 51 ++++++++++++++++++++---
user-guide/sql/write_options.html | 62 +++++++++-------------------
5 files changed, 120 insertions(+), 79 deletions(-)
diff --git a/_sources/user-guide/sql/dml.md.txt
b/_sources/user-guide/sql/dml.md.txt
index 79c36092fd..666e86b460 100644
--- a/_sources/user-guide/sql/dml.md.txt
+++ b/_sources/user-guide/sql/dml.md.txt
@@ -35,8 +35,22 @@ TO '<i><b>file_name</i></b>'
[ OPTIONS( <i><b>option</i></b> [, ... ] ) ]
</pre>
+`STORED AS` specifies the file format the `COPY` command will write. If this
+clause is not specified, it will be inferred from the file extension if
possible.
+
+`PARTITIONED BY` specifies the columns to use for partitioning the output
files into
+separate hive-style directories.
+
+The output format is determined by the first match of the following rules:
+
+1. Value of `STORED AS`
+2. Value of the `OPTION (FORMAT ..)`
+3. Filename extension (e.g. `foo.parquet` implies `PARQUET` format)
+
For a detailed list of valid OPTIONS, see [Write Options](write_options).
+### Examples
+
Copy the contents of `source_table` to `file_name.json` in JSON format:
```sql
@@ -72,6 +86,23 @@ of hive-style partitioned parquet files:
+-------+
```
+If the the data contains values of `x` and `y` in column1 and only `a` in
+column2, output files will appear in the following directory structure:
+
+```
+dir_name/
+ column1=x/
+ column2=a/
+ <file>.parquet
+ <file>.parquet
+ ...
+ column1=y/
+ column2=a/
+ <file>.parquet
+ <file>.parquet
+ ...
+```
+
Run the query `SELECT * from source ORDER BY time` and write the
results (maintaining the order) to a parquet file named
`output.parquet` with a maximum parquet row group size of 10MB:
@@ -85,14 +116,10 @@ results (maintaining the order) to a parquet file named
+-------+
```
-The output format is determined by the first match of the following rules:
-
-1. Value of `STORED AS`
-2. Value of the `OPTION (FORMAT ..)`
-3. Filename extension (e.g. `foo.parquet` implies `PARQUET` format)
-
## INSERT
+### Examples
+
Insert values into a table.
<pre>
diff --git a/_sources/user-guide/sql/write_options.md.txt
b/_sources/user-guide/sql/write_options.md.txt
index ac0a41a97f..5c204d8fc0 100644
--- a/_sources/user-guide/sql/write_options.md.txt
+++ b/_sources/user-guide/sql/write_options.md.txt
@@ -35,44 +35,41 @@ If inserting to an external table, table specific write
options can be specified
```sql
CREATE EXTERNAL TABLE
-my_table(a bigint, b bigint)
-STORED AS csv
-COMPRESSION TYPE gzip
-WITH HEADER ROW
-DELIMITER ';'
-LOCATION '/test/location/my_csv_table/'
-OPTIONS(
-NULL_VALUE 'NAN'
-);
+ my_table(a bigint, b bigint)
+ STORED AS csv
+ COMPRESSION TYPE gzip
+ WITH HEADER ROW
+ DELIMITER ';'
+ LOCATION '/test/location/my_csv_table/'
+ OPTIONS(
+ NULL_VALUE 'NAN'
+ )
```
When running `INSERT INTO my_table ...`, the options from the `CREATE TABLE`
will be respected (gzip compression, special delimiter, and header row
included). There will be a single output file if the output path doesn't have
folder format, i.e. ending with a `\`. Note that compression, header, and
delimiter settings can also be specified within the `OPTIONS` tuple list.
Dedicated syntax within the SQL statement always takes precedence over
arbitrary option tuples, so if both are specifi [...]
Finally, options can be passed when running a `COPY` command.
+<!--
+ Test the following example with:
+ CREATE TABLE source_table AS VALUES ('1','2','3','4');
+-->
+
```sql
COPY source_table
-TO 'test/table_with_options'
-(format parquet,
-compression snappy,
-'compression::col1' 'zstd(5)',
-partition_by 'column3, column4'
-)
+ TO 'test/table_with_options'
+ PARTITIONED BY (column3, column4)
+ OPTIONS (
+ format parquet,
+ compression snappy,
+ 'compression::column1' 'zstd(5)',
+ )
```
In this example, we write the entirety of `source_table` out to a folder of
parquet files. One parquet file will be written in parallel to the folder for
each partition in the query. The next option `compression` set to `snappy`
indicates that unless otherwise specified all columns should use the snappy
compression codec. The option `compression::col1` sets an override, so that the
column `col1` in the parquet file will use `ZSTD` compression codec with
compression level `5`. In general, [...]
## Available Options
-### COPY Specific Options
-
-The following special options are specific to the `COPY` command.
-
-| Option | Description
| Default Value |
-| ------------ |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| ------------- |
-| FORMAT | Specifies the file format COPY query will write out. If
there're more than one output file or the format cannot be inferred from the
file extension, then FORMAT must be specified. | N/A |
-| PARTITION_BY | Specifies the columns that the output files should be
partitioned by into separate hive-style directories. Value should be a comma
separated string literal, e.g. 'col1,col2' | N/A |
-
### JSON Format Specific Options
The following options are available when writing JSON files. Note: If any
unsupported option is specified, an error will be raised and the query will
fail.
diff --git a/searchindex.js b/searchindex.js
index 2fb0a85217..3158f67541 100644
--- a/searchindex.js
+++ b/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"docnames": ["contributor-guide/architecture",
"contributor-guide/communication", "contributor-guide/index",
"contributor-guide/quarterly_roadmap", "contributor-guide/roadmap",
"contributor-guide/specification/index",
"contributor-guide/specification/invariants",
"contributor-guide/specification/output-field-name-semantic", "index",
"library-user-guide/adding-udfs", "library-user-guide/building-logical-plans",
"library-user-guide/catalogs", "library-user-guide/custom-tab [...]
\ No newline at end of file
+Search.setIndex({"docnames": ["contributor-guide/architecture",
"contributor-guide/communication", "contributor-guide/index",
"contributor-guide/quarterly_roadmap", "contributor-guide/roadmap",
"contributor-guide/specification/index",
"contributor-guide/specification/invariants",
"contributor-guide/specification/output-field-name-semantic", "index",
"library-user-guide/adding-udfs", "library-user-guide/building-logical-plans",
"library-user-guide/catalogs", "library-user-guide/custom-tab [...]
\ No newline at end of file
diff --git a/user-guide/sql/dml.html b/user-guide/sql/dml.html
index 2b48fdb9c6..305596cbf5 100644
--- a/user-guide/sql/dml.html
+++ b/user-guide/sql/dml.html
@@ -361,11 +361,25 @@
<a class="reference internal nav-link" href="#copy">
COPY
</a>
+ <ul class="nav section-nav flex-column">
+ <li class="toc-h3 nav-item toc-entry">
+ <a class="reference internal nav-link" href="#examples">
+ Examples
+ </a>
+ </li>
+ </ul>
</li>
<li class="toc-h2 nav-item toc-entry">
<a class="reference internal nav-link" href="#insert">
INSERT
</a>
+ <ul class="nav section-nav flex-column">
+ <li class="toc-h3 nav-item toc-entry">
+ <a class="reference internal nav-link" href="#id1">
+ Examples
+ </a>
+ </li>
+ </ul>
</li>
</ul>
@@ -428,7 +442,19 @@ TO '<i><b>file_name</i></b>'
[ PARTITIONED BY <i><b>column_name</i></b> [, ...] ]
[ OPTIONS( <i><b>option</i></b> [, ... ] ) ]
</pre>
+<p><code class="docutils literal notranslate"><span class="pre">STORED</span>
<span class="pre">AS</span></code> specifies the file format the <code
class="docutils literal notranslate"><span class="pre">COPY</span></code>
command will write. If this
+clause is not specified, it will be inferred from the file extension if
possible.</p>
+<p><code class="docutils literal notranslate"><span
class="pre">PARTITIONED</span> <span class="pre">BY</span></code> specifies the
columns to use for partitioning the output files into
+separate hive-style directories.</p>
+<p>The output format is determined by the first match of the following
rules:</p>
+<ol class="arabic simple">
+<li><p>Value of <code class="docutils literal notranslate"><span
class="pre">STORED</span> <span class="pre">AS</span></code></p></li>
+<li><p>Value of the <code class="docutils literal notranslate"><span
class="pre">OPTION</span> <span class="pre">(FORMAT</span> <span
class="pre">..)</span></code></p></li>
+<li><p>Filename extension (e.g. <code class="docutils literal
notranslate"><span class="pre">foo.parquet</span></code> implies <code
class="docutils literal notranslate"><span class="pre">PARQUET</span></code>
format)</p></li>
+</ol>
<p>For a detailed list of valid OPTIONS, see <a class="reference internal"
href="write_options.html"><span class="doc std std-doc">Write
Options</span></a>.</p>
+<section id="examples">
+<h3>Examples<a class="headerlink" href="#examples" title="Link to this
heading">¶</a></h3>
<p>Copy the contents of <code class="docutils literal notranslate"><span
class="pre">source_table</span></code> to <code class="docutils literal
notranslate"><span class="pre">file_name.json</span></code> in JSON format:</p>
<div class="highlight-sql notranslate"><div
class="highlight"><pre><span></span><span class="o">></span><span class="w">
</span><span class="k">COPY</span><span class="w"> </span><span
class="n">source_table</span><span class="w"> </span><span
class="k">TO</span><span class="w"> </span><span
class="s1">'file_name.json'</span><span class="p">;</span>
<span class="o">+</span><span class="c1">-------+</span>
@@ -458,6 +484,21 @@ of hive-style partitioned parquet files:</p>
<span class="o">+</span><span class="c1">-------+</span>
</pre></div>
</div>
+<p>If the the data contains values of <code class="docutils literal
notranslate"><span class="pre">x</span></code> and <code class="docutils
literal notranslate"><span class="pre">y</span></code> in column1 and only
<code class="docutils literal notranslate"><span class="pre">a</span></code> in
+column2, output files will appear in the following directory structure:</p>
+<div class="highlight-default notranslate"><div
class="highlight"><pre><span></span><span class="n">dir_name</span><span
class="o">/</span>
+ <span class="n">column1</span><span class="o">=</span><span
class="n">x</span><span class="o">/</span>
+ <span class="n">column2</span><span class="o">=</span><span
class="n">a</span><span class="o">/</span>
+ <span class="o"><</span><span class="n">file</span><span
class="o">>.</span><span class="n">parquet</span>
+ <span class="o"><</span><span class="n">file</span><span
class="o">>.</span><span class="n">parquet</span>
+ <span class="o">...</span>
+ <span class="n">column1</span><span class="o">=</span><span
class="n">y</span><span class="o">/</span>
+ <span class="n">column2</span><span class="o">=</span><span
class="n">a</span><span class="o">/</span>
+ <span class="o"><</span><span class="n">file</span><span
class="o">>.</span><span class="n">parquet</span>
+ <span class="o"><</span><span class="n">file</span><span
class="o">>.</span><span class="n">parquet</span>
+ <span class="o">...</span>
+</pre></div>
+</div>
<p>Run the query <code class="docutils literal notranslate"><span
class="pre">SELECT</span> <span class="pre">*</span> <span
class="pre">from</span> <span class="pre">source</span> <span
class="pre">ORDER</span> <span class="pre">BY</span> <span
class="pre">time</span></code> and write the
results (maintaining the order) to a parquet file named
<code class="docutils literal notranslate"><span
class="pre">output.parquet</span></code> with a maximum parquet row group size
of 10MB:</p>
@@ -469,15 +510,12 @@ results (maintaining the order) to a parquet file named
<span class="o">+</span><span class="c1">-------+</span>
</pre></div>
</div>
-<p>The output format is determined by the first match of the following
rules:</p>
-<ol class="arabic simple">
-<li><p>Value of <code class="docutils literal notranslate"><span
class="pre">STORED</span> <span class="pre">AS</span></code></p></li>
-<li><p>Value of the <code class="docutils literal notranslate"><span
class="pre">OPTION</span> <span class="pre">(FORMAT</span> <span
class="pre">..)</span></code></p></li>
-<li><p>Filename extension (e.g. <code class="docutils literal
notranslate"><span class="pre">foo.parquet</span></code> implies <code
class="docutils literal notranslate"><span class="pre">PARQUET</span></code>
format)</p></li>
-</ol>
+</section>
</section>
<section id="insert">
<h2>INSERT<a class="headerlink" href="#insert" title="Link to this
heading">¶</a></h2>
+<section id="id1">
+<h3>Examples<a class="headerlink" href="#id1" title="Link to this
heading">¶</a></h3>
<p>Insert values into a table.</p>
<pre>
INSERT INTO <i><b>table_name</i></b> { VALUES ( <i><b>expression</i></b> [,
...] ) [, ...] | <i><b>query</i></b> }
@@ -491,6 +529,7 @@ INSERT INTO <i><b>table_name</i></b> { VALUES (
<i><b>expression</i></b> [, ...]
</pre></div>
</div>
</section>
+</section>
</section>
diff --git a/user-guide/sql/write_options.html
b/user-guide/sql/write_options.html
index df59ffd1e3..08684a8c4a 100644
--- a/user-guide/sql/write_options.html
+++ b/user-guide/sql/write_options.html
@@ -367,11 +367,6 @@
Available Options
</a>
<ul class="nav section-nav flex-column">
- <li class="toc-h3 nav-item toc-entry">
- <a class="reference internal nav-link" href="#copy-specific-options">
- COPY Specific Options
- </a>
- </li>
<li class="toc-h3 nav-item toc-entry">
<a class="reference internal nav-link"
href="#json-format-specific-options">
JSON Format Specific Options
@@ -449,54 +444,37 @@
<p>For a list of supported session level config defaults see <a
class="reference internal" href="../configs.html"><span class="doc std
std-doc">Configuration Settings</span></a>. These defaults apply to all write
operations but have the lowest level of precedence.</p>
<p>If inserting to an external table, table specific write options can be
specified when the table is created using the <code class="docutils literal
notranslate"><span class="pre">OPTIONS</span></code> clause:</p>
<div class="highlight-sql notranslate"><div
class="highlight"><pre><span></span><span class="k">CREATE</span><span
class="w"> </span><span class="k">EXTERNAL</span><span class="w"> </span><span
class="k">TABLE</span>
-<span class="n">my_table</span><span class="p">(</span><span
class="n">a</span><span class="w"> </span><span class="nb">bigint</span><span
class="p">,</span><span class="w"> </span><span class="n">b</span><span
class="w"> </span><span class="nb">bigint</span><span class="p">)</span>
-<span class="n">STORED</span><span class="w"> </span><span
class="k">AS</span><span class="w"> </span><span class="n">csv</span>
-<span class="n">COMPRESSION</span><span class="w"> </span><span
class="k">TYPE</span><span class="w"> </span><span class="n">gzip</span>
-<span class="k">WITH</span><span class="w"> </span><span
class="n">HEADER</span><span class="w"> </span><span class="k">ROW</span>
-<span class="k">DELIMITER</span><span class="w"> </span><span
class="s1">';'</span>
-<span class="k">LOCATION</span><span class="w"> </span><span
class="s1">'/test/location/my_csv_table/'</span>
-<span class="k">OPTIONS</span><span class="p">(</span>
-<span class="n">NULL_VALUE</span><span class="w"> </span><span
class="s1">'NAN'</span>
-<span class="p">);</span>
+<span class="w"> </span><span class="n">my_table</span><span
class="p">(</span><span class="n">a</span><span class="w"> </span><span
class="nb">bigint</span><span class="p">,</span><span class="w"> </span><span
class="n">b</span><span class="w"> </span><span class="nb">bigint</span><span
class="p">)</span>
+<span class="w"> </span><span class="n">STORED</span><span class="w">
</span><span class="k">AS</span><span class="w"> </span><span
class="n">csv</span>
+<span class="w"> </span><span class="n">COMPRESSION</span><span class="w">
</span><span class="k">TYPE</span><span class="w"> </span><span
class="n">gzip</span>
+<span class="w"> </span><span class="k">WITH</span><span class="w">
</span><span class="n">HEADER</span><span class="w"> </span><span
class="k">ROW</span>
+<span class="w"> </span><span class="k">DELIMITER</span><span class="w">
</span><span class="s1">';'</span>
+<span class="w"> </span><span class="k">LOCATION</span><span class="w">
</span><span class="s1">'/test/location/my_csv_table/'</span>
+<span class="w"> </span><span class="k">OPTIONS</span><span class="p">(</span>
+<span class="w"> </span><span class="n">NULL_VALUE</span><span class="w">
</span><span class="s1">'NAN'</span>
+<span class="w"> </span><span class="p">)</span>
</pre></div>
</div>
<p>When running <code class="docutils literal notranslate"><span
class="pre">INSERT</span> <span class="pre">INTO</span> <span
class="pre">my_table</span> <span class="pre">...</span></code>, the options
from the <code class="docutils literal notranslate"><span
class="pre">CREATE</span> <span class="pre">TABLE</span></code> will be
respected (gzip compression, special delimiter, and header row included). There
will be a single output file if the output path doesn’t have folder format, i.
[...]
<p>Finally, options can be passed when running a <code class="docutils literal
notranslate"><span class="pre">COPY</span></code> command.</p>
+<!--
+ Test the following example with:
+ CREATE TABLE source_table AS VALUES ('1','2','3','4');
+-->
<div class="highlight-sql notranslate"><div
class="highlight"><pre><span></span><span class="k">COPY</span><span class="w">
</span><span class="n">source_table</span>
-<span class="k">TO</span><span class="w"> </span><span
class="s1">'test/table_with_options'</span>
-<span class="p">(</span><span class="n">format</span><span class="w">
</span><span class="n">parquet</span><span class="p">,</span>
-<span class="n">compression</span><span class="w"> </span><span
class="n">snappy</span><span class="p">,</span>
-<span class="s1">'compression::col1'</span><span class="w">
</span><span class="s1">'zstd(5)'</span><span class="p">,</span>
-<span class="n">partition_by</span><span class="w"> </span><span
class="s1">'column3, column4'</span>
-<span class="p">)</span>
+<span class="w"> </span><span class="k">TO</span><span class="w">
</span><span class="s1">'test/table_with_options'</span>
+<span class="w"> </span><span class="n">PARTITIONED</span><span class="w">
</span><span class="k">BY</span><span class="w"> </span><span
class="p">(</span><span class="n">column3</span><span class="p">,</span><span
class="w"> </span><span class="n">column4</span><span class="p">)</span>
+<span class="w"> </span><span class="k">OPTIONS</span><span class="w">
</span><span class="p">(</span>
+<span class="w"> </span><span class="n">format</span><span class="w">
</span><span class="n">parquet</span><span class="p">,</span>
+<span class="w"> </span><span class="n">compression</span><span class="w">
</span><span class="n">snappy</span><span class="p">,</span>
+<span class="w"> </span><span
class="s1">'compression::column1'</span><span class="w"> </span><span
class="s1">'zstd(5)'</span><span class="p">,</span>
+<span class="w"> </span><span class="p">)</span>
</pre></div>
</div>
<p>In this example, we write the entirety of <code class="docutils literal
notranslate"><span class="pre">source_table</span></code> out to a folder of
parquet files. One parquet file will be written in parallel to the folder for
each partition in the query. The next option <code class="docutils literal
notranslate"><span class="pre">compression</span></code> set to <code
class="docutils literal notranslate"><span class="pre">snappy</span></code>
indicates that unless otherwise specified [...]
</section>
<section id="available-options">
<h2>Available Options<a class="headerlink" href="#available-options"
title="Link to this heading">¶</a></h2>
-<section id="copy-specific-options">
-<h3>COPY Specific Options<a class="headerlink" href="#copy-specific-options"
title="Link to this heading">¶</a></h3>
-<p>The following special options are specific to the <code class="docutils
literal notranslate"><span class="pre">COPY</span></code> command.</p>
-<table class="table">
-<thead>
-<tr class="row-odd"><th class="head"><p>Option</p></th>
-<th class="head"><p>Description</p></th>
-<th class="head"><p>Default Value</p></th>
-</tr>
-</thead>
-<tbody>
-<tr class="row-even"><td><p>FORMAT</p></td>
-<td><p>Specifies the file format COPY query will write out. If there’re more
than one output file or the format cannot be inferred from the file extension,
then FORMAT must be specified.</p></td>
-<td><p>N/A</p></td>
-</tr>
-<tr class="row-odd"><td><p>PARTITION_BY</p></td>
-<td><p>Specifies the columns that the output files should be partitioned by
into separate hive-style directories. Value should be a comma separated string
literal, e.g. ‘col1,col2’</p></td>
-<td><p>N/A</p></td>
-</tr>
-</tbody>
-</table>
-</section>
<section id="json-format-specific-options">
<h3>JSON Format Specific Options<a class="headerlink"
href="#json-format-specific-options" title="Link to this heading">¶</a></h3>
<p>The following options are available when writing JSON files. Note: If any
unsupported option is specified, an error will be raised and the query will
fail.</p>