marin-ma commented on issue #11542: URL: https://github.com/apache/incubator-gluten/issues/11542#issuecomment-3840699433
By looking at the code, I found there's extra memory allocation for spilling in hash shuffle writer + rss, but not applicable to sort shuffle writer + rss. The compressing for hash and sort shuffle are different. For hash shuffle writer, the data is compressed by each column buffer before sending to rss, but for sort shuffle writer it's using streaming compression to compress a fixed block size each time. To avoid triggering spill in hash shuffle writer + rss, we may need to skip the compression if the evict is triggered by another spill, and the modification is like this https://github.com/marin-ma/gluten/commit/105b78fdeb5134135482a36eb212d27e52e13fa1 For sort shuffle writer, we need to identify why the memory consumption is so high that yarn finally kills the executor. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
