metesynnada commented on code in PR #6049:
URL: https://github.com/apache/arrow-datafusion/pull/6049#discussion_r1179310840
##########
datafusion/core/src/physical_plan/memory.rs:
##########
@@ -223,15 +245,365 @@ impl RecordBatchStream for MemoryStream {
}
}
+/// Execution plan for writing record batches to an in-memory table.
+pub struct MemoryWriteExec {
Review Comment:
Experimenting with the single plan idea, my main challenge was figuring out
the best way to determine the execution place of the insert operation, i.e what
`insert_into()` API would return. Do we return a `tokio::AsyncWrite` (accepts a
`RecordBatch`, to use in a `futures::stream`), a `futures::stream`, or possibly
creating a new trait that would be tailored specifically for the TableProvider
information?
If the API was to return a `futures::stream` as the operator, it might not
allow for a single execution plan without some workarounds. But if I use a
`tokio::AsyncWrite`, it may be possible to use a single executor, though we
haven’t used this before in Datafusion.
All in all, having a distinct `ExecutionPlan` to handle writes for each data
source mirrors the behavior on the read side and seems much more natural. What
do you think?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]