[PR] Support Copy To Partitioned Files [arrow-datafusion]

via GitHub Thu, 15 Feb 2024 07:02:49 -0800


devinjdangelo opened a new pull request, #9240:
URL: https://github.com/apache/arrow-datafusion/pull/9240


   ## Which issue does this PR close?
   
   Closes #9237
   Closes #8493
   
   ## Rationale for this change
   
   We currently support writing to partitioned listing tables. It would be nice 
to leverage the same physical plan implementation within a CopyTo statement, so 
users can skip registering a table if they don't need one.
   
   ## What changes are included in this PR?
   
   - Extends CopyTo LogicalPlan to support partition_by option
   - Extends CopyTo physical planning to propagate partition_by columns to the 
FileSinkExec plan
   - Extends DataFrameWriteOptions to support the same partition_by option
   - Adds partition column DataType inference to demux code so users do not 
have to explicitly specify data type in the COPY statement.
   
   With these changes we can support queries like the following:
   
   ```sql
   COPY source_table TO 'test_files/scratch/copy/partitioned_table1/' 
   (format parquet, compression 'zstd(10)', partition_by 'col2');
   
   COPY (values (1, 'a', 'x'), (2, 'b', 'y'), (3, 'c', 'z')) TO 
'test_files/scratch/copy/partitioned_table2/' 
   (format parquet, compression 'zstd(10)', partition_by 'column2, column3');
   ```
   
   ## Are these changes tested?
   
   Yes, the examples above are validated in copy.slt tests.
   
   ## Are there any user-facing changes?
   
   Copying to partition files is easier now


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] Support Copy To Partitioned Files [arrow-datafusion]

Reply via email to