alamb commented on issue #5076:
URL:
https://github.com/apache/arrow-datafusion/issues/5076#issuecomment-1520521168
I think we should stay true to the design goal of DataFusion and keep this
functionality as modular as possible (aka implemented in terms of traits that
can be extended by other systems).
Here are some ideas:
# Idea: Add a physical plan for LogicalPlan::DML
(I think this is what @andygrove is suggesting).
This would add a way to create a physical plan for `LogicalPlan::Dml(.. op:
Insert)` and have that implementation call the appropriate (TBD) methods on
`TableProvider` that would handle writing. This is similar to what @metesynnada
proposes in https://github.com/apache/arrow-datafusion/pull/6049 though it is
not mem table specific.
The upside here is we already have all the flow and planner and it would
follow the pattern of system like spark (e.g.
[DataWritingCommandExec](https://jaceklaskowski.gitbooks.io/mastering-spark-sql/content/spark-sql-SparkPlan-DataWritingCommandExec.html)
-- thanks to @metesynnada for the link)
The downsides are that such an ExecutionPlan is kind of strange (it makes no
output, so therefore most of the methods like "output ordering" are basically
useless) as I mentioned on
https://github.com/apache/arrow-datafusion/pull/6049#pullrequestreview-1396922679
# Idea: Add specific runner / executor for Inserts / Update / Deletes
Maybe we could provide a function or struct `run_insert(source: Arc<dyn
ExecutionPlan>, target: Arc<dyn TableProvider>)` that would orchestrate:
1. Running the execution plan
2. calling appropriate (TBD) methods on `TableProvider` that would handle
writing
Here is how you might run it:
```rust
let runner = Insert::new(context)
.target(my_table)
.run(target)?
```
A benefit here is that only systems that wanted to handle DML would invoke
the inserter.
A downside is that it would require more code / connections to work
Maybe @avantgardnerio has some thoughts in this area, as I think he has a
system that does DML as well based on DataFusion
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]