[GitHub] [arrow-datafusion] devinjdangelo opened a new pull request, #7141: Unify DataFrame and SQL (Insert Into) Write Methods

via GitHub Sun, 30 Jul 2023 09:26:55 -0700


devinjdangelo opened a new pull request, #7141:
URL: https://github.com/apache/arrow-datafusion/pull/7141


   # Which issue does this PR close?
   Closes #5076 
   
   # Rationale for this change
   The goal of this PR is to enable DataFrame write methods to leverage a 
common implementation with SQL `Insert Into` statements, so that common logic 
related to writing via `ObjectStore` and parallelization or other optimizations 
can be made in one place (such as those discussed in #7079).
   
   # What changes are included in this PR?
   The following changes are completed/planned:
   
   - [x] Implement `DataFrame.write_table` method which creates an insert_into 
`LogicalPlan`and executes eagerly
   - [x] Extend `InsertExec` / `DataSync` / `ListingTable.insert_into` to 
support writing multiple files from multiple partitions
   - [x] Extend `CsvSink` to support writing multiple partitions to multiple 
files
   - [ ] Create `JsonSink` supporting writing multiple partitions to multiple 
files
   - [ ] Create `ParquetSink` supporting writing multiple partitions to 
multiple files
   - [ ] Update existing `write_json`, `write_csv`, and `write_parquet` to 
create temporary tables and to call `DataFrame.write_table` 
   
   
   
   # Are these changes tested?
   
   I have not yet implemented any new tests to cover these changes. Any 
suggestions on new tests are welcome.
   
   # Are there any user-facing changes?
   
   The goal is for existing `DataFrame` write methods to behave nearly 
identically to before. The `write_table` method is a new public method.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] devinjdangelo opened a new pull request, #7141: Unify DataFrame and SQL (Insert Into) Write Methods

Reply via email to