devinjdangelo opened a new pull request, #7283: URL: https://github.com/apache/arrow-datafusion/pull/7283
## Which issue does this PR close? closes #5076 Part of #5654 ## Rationale for this change In many cases, we want to be able to export data to file(s) in an ObjectStore without first registering an external table. This is possible with `COPY ... TO ...` statements. We can leverage the FileSinks created to support inserting to ListingTables for part of the implementation for this. ## What changes are included in this PR? - Implement a logical plan for Copy To statements - Generalize name of `InsertExec` to `FileSinkExec` - Implement a physical plan for Copy To statements relying on `FileSinkExec` - Expand sqllogictests in copy.slt, add support for automatically cleared directory in sqllogictests for writing files fresh - Reimplement `DataFrame::write_* `methods to use `Copy To` - Add support for [per_thread_output ](https://duckdb.org/docs/sql/statements/copy.html) setting in `FileSinks `and `Copy To` so user can specify if they want only one file output or possibly many is ok Note that this PR does not add support for most statement level settings / overrides yet. That will be important to implement before closing out #5654. This graphic shows how all of the write related code is wired up after this PR:  ## Are these changes tested? Yes, see expanded copy.slt for new tests. I also have plans to expand insert.slt to improve testing of recent additions of `insert into` support. ## Are there any user-facing changes? Copy To statements (less most statement level overrides) are supported now. DataFrame write_* APIs have some small changes will need more changes as support for statement level overrides is added for copy to -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
