marin-ma commented on PR #11722: URL: https://github.com/apache/gluten/pull/11722#issuecomment-4037894211
@guowangy In general, random I/O is considered a bottleneck in shuffle, and that's why there are so many remote shuffle service projects and solutions like celeborn, uniffle are aimed at. The remote shuffle service usually coalesce the shuffle outputs from mapper side to reduce the random IO access. However, the design in this PR seems to go in the opposite direction, since it may introduce more random I/O during reads. Directly writing the segments to the data file would make the partition writer logic simpler, but we intentionally didn't choose that approach based on the above consideration. I'm not sure if your test is based on single node or on a cluster. If it's on single node and disk IO is not bottleneck, then the solution may not be practical in real use case. Besides, based on our experience, external shuffle service is usually enabled in real production environments because it provides better stability when executor process is down, and it's more like a must-have feature that the shuffle framework should support. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
