[GitHub] [arrow-datafusion] metesynnada commented on a diff in pull request #6049: MemoryExec INSERT INTO refactor to use ExecutionPlan

via GitHub Thu, 27 Apr 2023 08:10:12 -0700


metesynnada commented on code in PR #6049:
URL: https://github.com/apache/arrow-datafusion/pull/6049#discussion_r1179310840



##########
datafusion/core/src/physical_plan/memory.rs:
##########
@@ -223,15 +245,365 @@ impl RecordBatchStream for MemoryStream {
     }
 }
 
+/// Execution plan for writing record batches to an in-memory table.
+pub struct MemoryWriteExec {

Review Comment:
   Experimenting with the single plan idea, my main challenge was figuring out 
the best way to determine the execution place of the insert operation, i.e what 
`insert_into()` API would return. Do we return a `tokio::AsyncWrite` (accepts a 
`RecordBatch`, to use in a `futures::stream`), a `futures::stream`, or possibly 
creating a new trait that would be tailored specifically for the TableProvider 
information?
   If the API was to return a `futures::stream` as the operator, it might not 
allow for a single execution plan without some workarounds. But if I use a 
`tokio::AsyncWrite`, it may be possible to use a single executor, though we 
haven’t used this before in Datafusion.
   All in all, having a distinct `ExecutionPlan` to handle writes for each data 
source mirrors the behavior on the read side and seems much more natural. What 
do you think?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] metesynnada commented on a diff in pull request #6049: MemoryExec INSERT INTO refactor to use ExecutionPlan

Reply via email to