alamb commented on issue #5076:
URL: 
https://github.com/apache/arrow-datafusion/issues/5076#issuecomment-1520521168

   I think we should stay true to the design goal of DataFusion and keep this 
functionality as modular as possible (aka implemented in terms of traits that 
can be extended by other systems). 
   
   Here are some ideas:
   
   # Idea: Add a physical plan for LogicalPlan::DML
   (I think this is what @andygrove  is suggesting). 
   
    This would add a way to create a physical plan for `LogicalPlan::Dml(.. op: 
Insert)` and have that implementation call the  appropriate (TBD) methods on 
`TableProvider` that would handle writing. This is similar to what @metesynnada 
 proposes in https://github.com/apache/arrow-datafusion/pull/6049 though it is 
not mem table specific. 
   
   The upside here is we already have all the flow and planner and it would 
follow the pattern of system like spark (e.g.  
[DataWritingCommandExec](https://jaceklaskowski.gitbooks.io/mastering-spark-sql/content/spark-sql-SparkPlan-DataWritingCommandExec.html)
 -- thanks to @metesynnada  for the link)
   
   The downsides are that such an ExecutionPlan is kind of strange (it makes no 
output, so therefore most of the methods like "output ordering" are basically 
useless) as I mentioned on  
https://github.com/apache/arrow-datafusion/pull/6049#pullrequestreview-1396922679
   
   # Idea: Add specific runner / executor for  Inserts / Update / Deletes 
   Maybe we could provide a function or struct  `run_insert(source: Arc<dyn 
ExecutionPlan>, target: Arc<dyn TableProvider>)` that would orchestrate:
   
   1. Running the execution plan
   2. calling appropriate (TBD) methods on `TableProvider` that would handle 
writing
   
   Here is how you might run it:
   
   ```rust
   let runner = Insert::new(context)
     .target(my_table)
     .run(target)?
   ```
   
   A  benefit here is that only systems that wanted to handle DML would invoke 
the inserter.
   
   A downside is that it would require more code / connections to work
   
   
   Maybe @avantgardnerio  has some thoughts in this area, as I think he has a 
system that does DML as well based on DataFusion
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to