yjshen edited a comment on issue #2079:
URL: 
https://github.com/apache/arrow-datafusion/issues/2079#issuecomment-1083205518


   > Translation of a LogicalPlan::TableScan into a corresponding ExecutionPlan
   
   Sounds great. We could eliminate much of the common logic scattered in 
different file formats, as you mentioned. And yes, this makes sense!
   
   >  removing the ability to scan multiple files within a single operator, and 
instead composing multiple per-file operators in the generated ExecutionPlan.
   
   I'm confused here: How would we parallelize the physical operators after 
this `TableScanOp`? And how do we control this `single` operators parallelism? 
Will the data batch be sent through a multi-producer(the single 
ops)-multi-consumer(like filter/project/sort) queue?
   
   >  compose the SendableRecordBatchStream together. I think making this more 
explicit sounds like a good idea, but I'm not sure it is a simple case of 
grouping operators together, and would likely involve altering the contract of 
ExecutionPlan to be more explicit about what operators spawn additional 
parallel tasks. Perhaps this is what you're proposing?
   
   What do you think if we avoid `async` and `stream` from normal operators' 
`execute()`? just like you and @alamb mentioned merging project/filter logic 
into scan operator, these pure computations are async free in my opinion, they 
are data or computation-intensive and do not talk with IO systems. What do you 
think if we have (similar to that of 
[Morsel-driven](https://15721.courses.cs.cmu.edu/spring2016/papers/p743-leis.pdf)):
 
   
   <img width="847" alt="image" 
src="https://user-images.githubusercontent.com/1387718/160857783-cd666871-8063-433a-92b3-98cc097ca117.png";>
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to