alamb commented on issue #30:
URL: https://github.com/apache/arrow-ballista/issues/30#issuecomment-1274655630

   In terms of pipeline execution (at least in terms of a push based, pipelined 
execution model), I wanted to point out that @tustvold  investigated this 
approach in DataFusion (and figured out a way to reuse the current operators). 
See https://github.com/apache/arrow-datafusion/pull/2226 which added a 
scheduler under a feature flag
   
   Our eventual goal is to support running a plan on 100s of parquet files 
without having to fetch them all before (or concurrently). However, we 
currently have other things blocking this goal so additional work to the 
scheduler is on hold for now 
   
   You can find more detail on 
https://github.com/apache/arrow-datafusion/issues/2504
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to