jon-chuang commented on issue #1221:
URL: 
https://github.com/apache/arrow-datafusion/issues/1221#issuecomment-968350452


   Regarding shuffling, I saw in some benchmarks for [TiDB's distributed query 
engine](https://www.youtube.com/watch?v=mmzoSkEhYrA) (incidentally also relying 
on columnar storage) that an MPP style shuffle seemed to produce better results 
than map reduce style of Apache Spark. I think there are some open questions, 
such as whether Java could be the cause of this discrepancy. But maybe it's 
also worth thinking about how to optimize the shuffles.
   
   I don't know enough about DataFusion to know if it takes into account data 
movement when generating query plans.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to