[GitHub] [arrow] andygrove commented on pull request #8283: ARROW-9707: [Rust] [DataFusion] DataFusion Scheduler Prototype [WIP]

GitBox Fri, 13 Nov 2020 06:59:23 -0800


andygrove commented on pull request #8283:
URL: https://github.com/apache/arrow/pull/8283#issuecomment-726811184



   @alamb That is a good question. I have been too busy at work lately to work 
on Arrow/DataFusion/Ballista but I have been spending some time contemplating 
where to go next.
   
   This scheduler prototype is interesting, but until we have partitioning, 
shuffles, joins, and async working smoothly, perhaps there isn't much point 
working on the scheduler yet. I would be ok with closing this PR for now. It 
also might be premature for me to try and contribute a scheduler to DataFusion 
since I am really just prototyping this right now and lack experience in this 
area.
   
   If we had partitioning, shuffles, and joins in DataFusion, it would mean we 
could run a much wider range of TPC-H queries on a single node, and we could 
also build some nice command line utilities for converting data sets from CSV 
and Parquet with repartioning which would be quite compelling for a lot of 
people IMO and could attract new contributors.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] andygrove commented on pull request #8283: ARROW-9707: [Rust] [DataFusion] DataFusion Scheduler Prototype [WIP]

Reply via email to