marin-ma commented on issue #11542:
URL: 
https://github.com/apache/incubator-gluten/issues/11542#issuecomment-3876407763

   > a large amount of memory is held by VeloxSortShuffleWriter.pages_.
   
   @wForget Thanks for sharing the details. In sort shuffle writer, holding a 
large amount of pages is the expected behaviour, but it's also expected to 
trigger the pages to be spilled to free the memory when the memory manager 
fails to allocate new memory. In your case the executor is killed by yarn 
before spill is triggered, meaning the overhead memory exceeds limit. Could you 
try the following steps for the sort shuffle write:
   1. Increase the  `spark.executor.memoryOverhead` to see if the query can 
pass and if the shuffle writer spill can be triggered.
   2. If so, could you also try dumping the jemalloc stats when 
`VeloxUniffleColumnarShuffleWriter.spill` is called? It may help to see which 
parts hold so much overhead memory.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to