mingmwang commented on issue #1221: URL: https://github.com/apache/arrow-datafusion/issues/1221#issuecomment-968512095
> Regarding shuffling, I saw in some benchmarks for [TiDB's distributed query engine](https://youtu.be/mmzoSkEhYrA?t=3248) (incidentally also relying on columnar storage) that an MPP style shuffle seemed to produce better results than map reduce style of Apache Spark. I think there are some open questions, such as whether Java could be the cause of this discrepancy. But maybe it's also worth thinking about how to optimize the shuffles. > > I don't know enough about DataFusion to know if it takes into account data movement when generating query plans. Actually I'm working on a MPP style shuffle implementation, most of the coding part is done and I'm doing the testing. I'm not sure whether the community need this feature or not. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
