This is an automated email from the ASF dual-hosted git repository.
agrove pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git
The following commit(s) were added to refs/heads/main by this push:
new 89ee9b0c9b Update Docs with Copy partition_by support (#9275)
89ee9b0c9b is described below
commit 89ee9b0c9b27324a3662e5b50b56902eef7d7749
Author: Devin D'Angelo <[email protected]>
AuthorDate: Tue Feb 20 20:09:36 2024 -0500
Update Docs with Copy partition_by support (#9275)
* update copy docs
* prettier
---
docs/source/user-guide/sql/dml.md | 12 ++++++++++++
docs/source/user-guide/sql/write_options.md | 8 +++++---
2 files changed, 17 insertions(+), 3 deletions(-)
diff --git a/docs/source/user-guide/sql/dml.md
b/docs/source/user-guide/sql/dml.md
index 79b1d6625e..405e77a21b 100644
--- a/docs/source/user-guide/sql/dml.md
+++ b/docs/source/user-guide/sql/dml.md
@@ -57,6 +57,18 @@ files in the `dir_name` directory:
+-------+
```
+Copy the contents of `source_table` to multiple directories
+of hive-style partitioned parquet files:
+
+```sql
+> COPY source_table TO 'dir_name' (FORMAT parquet, partition_by 'column1,
column2');
++-------+
+| count |
++-------+
+| 2 |
++-------+
+```
+
Run the query `SELECT * from source ORDER BY time` and write the
results (maintaining the order) to a parquet file named
`output.parquet` with a maximum parquet row group size of 10MB:
diff --git a/docs/source/user-guide/sql/write_options.md
b/docs/source/user-guide/sql/write_options.md
index 09d51903f4..ac0a41a97f 100644
--- a/docs/source/user-guide/sql/write_options.md
+++ b/docs/source/user-guide/sql/write_options.md
@@ -56,6 +56,7 @@ TO 'test/table_with_options'
(format parquet,
compression snappy,
'compression::col1' 'zstd(5)',
+partition_by 'column3, column4'
)
```
@@ -67,9 +68,10 @@ In this example, we write the entirety of `source_table` out
to a folder of parq
The following special options are specific to the `COPY` command.
-| Option | Description
| Default Value |
-| ------ |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| ------------- |
-| FORMAT | Specifies the file format COPY query will write out. If there're
more than one output file or the format cannot be inferred from the file
extension, then FORMAT must be specified. | N/A |
+| Option | Description
| Default Value |
+| ------------ |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| ------------- |
+| FORMAT | Specifies the file format COPY query will write out. If
there're more than one output file or the format cannot be inferred from the
file extension, then FORMAT must be specified. | N/A |
+| PARTITION_BY | Specifies the columns that the output files should be
partitioned by into separate hive-style directories. Value should be a comma
separated string literal, e.g. 'col1,col2' | N/A |
### JSON Format Specific Options