alamb commented on PR #6049:
URL: 
https://github.com/apache/arrow-datafusion/pull/6049#issuecomment-1520524109

   > If you believe that simplifying the PR without losing functionality is the 
best course of action, I am willing to make the necessary changes.
   
   This is my preference -- and if/when we find that having an `ExecutionPlan` 
that implements the write logic, we can perhaps revive that part of this PR
   
   >  However, I would appreciate further discussion on how to align this PR 
with the goals outlined in issue 
https://github.com/apache/arrow-datafusion/issues/5076 while addressing your 
concerns about the resulting ExecutionPlan.
   
   My high level thinking is that a TableProvider should be a "Sink" that takes 
streams from executing
   
   After thinking about it more, I think an `ExecutionPlan` for implementing 
Inserts might be ok for the reasons you mention. 
   
   However,  concerns I have with the `MemoryWriteExec` in this PR is:
   1. It is tightly bound to the specific `MemTable` TableProvider but is in 
the general purpose physical plan area -- it seems the MemTable specific 
functionality should stay near / within the `datasource` module. 
   2. Most of the logic in `MemoryWriteExec` seems like it is general purpose 
-- only the actual `RwLock<Vec<RecordBatch>>>>` seems specific to Memory
   
   My reading of https://github.com/apache/arrow-datafusion/issues/5076 and 
https://jaceklaskowski.gitbooks.io/mastering-spark-sql/content/spark-sql-SparkPlan-DataWritingCommandExec.html
 suggest that there is single specific physical operator that handles writes to 
arbitrary datasources which is how I would prefer DataFusion proceeds. 
   
   I left some additional comments about this plan on 
https://github.com/apache/arrow-datafusion/issues/5076#issuecomment-1520521168


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to