alamb commented on issue #30: URL: https://github.com/apache/arrow-ballista/issues/30#issuecomment-1274655630
In terms of pipeline execution (at least in terms of a push based, pipelined execution model), I wanted to point out that @tustvold investigated this approach in DataFusion (and figured out a way to reuse the current operators). See https://github.com/apache/arrow-datafusion/pull/2226 which added a scheduler under a feature flag Our eventual goal is to support running a plan on 100s of parquet files without having to fetch them all before (or concurrently). However, we currently have other things blocking this goal so additional work to the scheduler is on hold for now You can find more detail on https://github.com/apache/arrow-datafusion/issues/2504 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org