marin-ma commented on issue #11542:
URL: 
https://github.com/apache/incubator-gluten/issues/11542#issuecomment-3840699433

   By looking at the code, I found there's extra memory allocation for spilling 
in hash shuffle writer + rss, but not applicable to sort shuffle writer + rss. 
The compressing for hash and sort shuffle are different. For hash shuffle 
writer, the data is compressed by each column buffer before sending to rss, but 
for sort shuffle writer it's using streaming compression to compress a fixed 
block size each time.
   
   To avoid triggering spill in hash shuffle writer + rss, we may need to skip 
the compression if the evict is triggered by another spill, and the 
modification is like this 
https://github.com/marin-ma/gluten/commit/105b78fdeb5134135482a36eb212d27e52e13fa1
 
   
   For sort shuffle writer, we need to identify why the memory consumption is 
so high that yarn finally kills the executor.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to