[GitHub] [arrow-datafusion] mingmwang commented on issue #1221: Task assignment between Scheduler and Executors

GitBox Sun, 14 Nov 2021 19:51:41 -0800


mingmwang commented on issue #1221:
URL: 
https://github.com/apache/arrow-datafusion/issues/1221#issuecomment-968512095



   > Regarding shuffling, I saw in some benchmarks for [TiDB's distributed 
query engine](https://youtu.be/mmzoSkEhYrA?t=3248) (incidentally also relying 
on columnar storage) that an MPP style shuffle seemed to produce better results 
than map reduce style of Apache Spark. I think there are some open questions, 
such as whether Java could be the cause of this discrepancy. But maybe it's 
also worth thinking about how to optimize the shuffles.
   > 
   > I don't know enough about DataFusion to know if it takes into account data 
movement when generating query plans.
   
   Actually I'm working on a MPP style shuffle implementation, most of the 
coding part is done and I'm doing the testing. 
   I'm not sure whether the community need this feature or not. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] mingmwang commented on issue #1221: Task assignment between Scheduler and Executors

Reply via email to