(arrow-datafusion) branch main updated: Update Docs with Copy partition_by support (#9275)

agrove Tue, 20 Feb 2024 17:10:30 -0800

This is an automated email from the ASF dual-hosted git repository.

agrove pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git



The following commit(s) were added to refs/heads/main by this push:
     new 89ee9b0c9b Update Docs with Copy partition_by support (#9275)
89ee9b0c9b is described below

commit 89ee9b0c9b27324a3662e5b50b56902eef7d7749
Author: Devin D'Angelo <[email protected]>
AuthorDate: Tue Feb 20 20:09:36 2024 -0500

    Update Docs with Copy partition_by support (#9275)
    
    * update copy docs
    
    * prettier
---
 docs/source/user-guide/sql/dml.md           | 12 ++++++++++++
 docs/source/user-guide/sql/write_options.md |  8 +++++---
 2 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/docs/source/user-guide/sql/dml.md 
b/docs/source/user-guide/sql/dml.md
index 79b1d6625e..405e77a21b 100644
--- a/docs/source/user-guide/sql/dml.md
+++ b/docs/source/user-guide/sql/dml.md
@@ -57,6 +57,18 @@ files in the `dir_name` directory:
 +-------+
 ```
 
+Copy the contents of `source_table` to multiple directories
+of hive-style partitioned parquet files:
+
+```sql
+> COPY source_table TO 'dir_name' (FORMAT parquet, partition_by 'column1, 
column2');
++-------+
+| count |
++-------+
+| 2     |
++-------+
+```
+
 Run the query `SELECT * from source ORDER BY time` and write the
 results (maintaining the order) to a parquet file named
 `output.parquet` with a maximum parquet row group size of 10MB:
diff --git a/docs/source/user-guide/sql/write_options.md 
b/docs/source/user-guide/sql/write_options.md
index 09d51903f4..ac0a41a97f 100644
--- a/docs/source/user-guide/sql/write_options.md
+++ b/docs/source/user-guide/sql/write_options.md
@@ -56,6 +56,7 @@ TO 'test/table_with_options'
 (format parquet,
 compression snappy,
 'compression::col1' 'zstd(5)',
+partition_by 'column3, column4'
 )
 ```
 
@@ -67,9 +68,10 @@ In this example, we write the entirety of `source_table` out 
to a folder of parq
 
 The following special options are specific to the `COPY` command.
 
-| Option | Description                                                         
                                                                                
                                | Default Value |
-| ------ | 
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 | ------------- |
-| FORMAT | Specifies the file format COPY query will write out. If there're 
more than one output file or the format cannot be inferred from the file 
extension, then FORMAT must be specified. | N/A           |
+| Option       | Description                                                   
                                                                                
                                      | Default Value |
+| ------------ | 
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 | ------------- |
+| FORMAT       | Specifies the file format COPY query will write out. If 
there're more than one output file or the format cannot be inferred from the 
file extension, then FORMAT must be specified. | N/A           |
+| PARTITION_BY | Specifies the columns that the output files should be 
partitioned by into separate hive-style directories. Value should be a comma 
separated string literal, e.g. 'col1,col2'       | N/A           |
 
 ### JSON Format Specific Options

(arrow-datafusion) branch main updated: Update Docs with Copy partition_by support (#9275)

Reply via email to