devinjdangelo opened a new pull request, #7283:
URL: https://github.com/apache/arrow-datafusion/pull/7283

   ## Which issue does this PR close?
   
   closes #5076 
   Part of #5654 
   
   ## Rationale for this change
   
   In many cases, we want to be able to export data to file(s) in an 
ObjectStore without first registering an external table. This is possible with 
`COPY ... TO ...` statements. We can leverage the FileSinks created to support 
inserting to ListingTables for part of the implementation for this.
   
   ## What changes are included in this PR?
   
   - Implement a logical plan for Copy To statements
   - Generalize name of `InsertExec` to `FileSinkExec`
   - Implement a physical plan for Copy To statements relying on `FileSinkExec`
   - Expand sqllogictests in copy.slt, add support for automatically cleared 
directory in sqllogictests for writing files fresh
   - Reimplement `DataFrame::write_* `methods to use `Copy To`
   - Add support for [per_thread_output 
](https://duckdb.org/docs/sql/statements/copy.html) setting in `FileSinks `and 
`Copy To`  so user can specify if they want only one file output or possibly 
many is ok
   
   Note that this PR does not add support for most statement level settings / 
overrides yet. That will be important to implement before closing out #5654.  
   
   This graphic shows how all of the write related code is wired up after this 
PR:
   
![datafusion_writes](https://github.com/apache/arrow-datafusion/assets/14827143/bfa49a11-21db-4857-bcc7-72db7f362039)
   
   
   ## Are these changes tested?
   
   Yes, see expanded copy.slt for new tests.
   
   I also have plans to expand insert.slt to improve testing of recent 
additions of `insert into` support.
   
   ## Are there any user-facing changes?
   
   Copy To statements (less most statement level overrides) are supported now. 
   
   DataFrame write_* APIs have some small changes will need more changes as 
support for statement level overrides is added for copy to


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to