andygrove commented on pull request #8283: URL: https://github.com/apache/arrow/pull/8283#issuecomment-726811184
@alamb That is a good question. I have been too busy at work lately to work on Arrow/DataFusion/Ballista but I have been spending some time contemplating where to go next. This scheduler prototype is interesting, but until we have partitioning, shuffles, joins, and async working smoothly, perhaps there isn't much point working on the scheduler yet. I would be ok with closing this PR for now. It also might be premature for me to try and contribute a scheduler to DataFusion since I am really just prototyping this right now and lack experience in this area. If we had partitioning, shuffles, and joins in DataFusion, it would mean we could run a much wider range of TPC-H queries on a single node, and we could also build some nice command line utilities for converting data sets from CSV and Parquet with repartioning which would be quite compelling for a lot of people IMO and could attract new contributors. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
