This is an automated email from the ASF dual-hosted git repository.
alamb pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git
The following commit(s) were added to refs/heads/main by this push:
new 2956ec2962 Update `COPY` documentation to reflect cahnges (#9754)
2956ec2962 is described below
commit 2956ec2962d7af94be53243427f8795d29fa90a3
Author: Andrew Lamb <[email protected]>
AuthorDate: Fri Mar 29 09:39:27 2024 -0400
Update `COPY` documentation to reflect cahnges (#9754)
---
docs/source/user-guide/sql/dml.md | 19 ++++++++++++++-----
1 file changed, 14 insertions(+), 5 deletions(-)
diff --git a/docs/source/user-guide/sql/dml.md
b/docs/source/user-guide/sql/dml.md
index b9614bb8f9..79c36092fd 100644
--- a/docs/source/user-guide/sql/dml.md
+++ b/docs/source/user-guide/sql/dml.md
@@ -25,11 +25,14 @@ and modifying data in tables.
## COPY
Copies the contents of a table or query to file(s). Supported file
-formats are `parquet`, `csv`, and `json` and can be inferred based on
-filename if writing to a single file.
+formats are `parquet`, `csv`, `json`, and `arrow`.
<pre>
-COPY { <i><b>table_name</i></b> | <i><b>query</i></b> } TO
'<i><b>file_name</i></b>' [ ( <i><b>option</i></b> [, ... ] ) ]
+COPY { <i><b>table_name</i></b> | <i><b>query</i></b> }
+TO '<i><b>file_name</i></b>'
+[ STORED AS <i><b>format</i></b> ]
+[ PARTITIONED BY <i><b>column_name</i></b> [, ...] ]
+[ OPTIONS( <i><b>option</i></b> [, ... ] ) ]
</pre>
For a detailed list of valid OPTIONS, see [Write Options](write_options).
@@ -61,7 +64,7 @@ Copy the contents of `source_table` to multiple directories
of hive-style partitioned parquet files:
```sql
-> COPY source_table TO 'dir_name' (FORMAT parquet, partition_by 'column1,
column2');
+> COPY source_table TO 'dir_name' STORED AS parquet, PARTITIONED BY (column1,
column2);
+-------+
| count |
+-------+
@@ -74,7 +77,7 @@ results (maintaining the order) to a parquet file named
`output.parquet` with a maximum parquet row group size of 10MB:
```sql
-> COPY (SELECT * from source ORDER BY time) TO 'output.parquet'
(ROW_GROUP_LIMIT_BYTES 10000000);
+> COPY (SELECT * from source ORDER BY time) TO 'output.parquet' OPTIONS
(MAX_ROW_GROUP_SIZE 10000000);
+-------+
| count |
+-------+
@@ -82,6 +85,12 @@ results (maintaining the order) to a parquet file named
+-------+
```
+The output format is determined by the first match of the following rules:
+
+1. Value of `STORED AS`
+2. Value of the `OPTION (FORMAT ..)`
+3. Filename extension (e.g. `foo.parquet` implies `PARQUET` format)
+
## INSERT
Insert values into a table.