marin-ma commented on issue #11542: URL: https://github.com/apache/incubator-gluten/issues/11542#issuecomment-3876407763
> a large amount of memory is held by VeloxSortShuffleWriter.pages_. @wForget Thanks for sharing the details. In sort shuffle writer, holding a large amount of pages is the expected behaviour, but it's also expected to trigger the pages to be spilled to free the memory when the memory manager fails to allocate new memory. In your case the executor is killed by yarn before spill is triggered, meaning the overhead memory exceeds limit. Could you try the following steps for the sort shuffle write: 1. Increase the `spark.executor.memoryOverhead` to see if the query can pass and if the shuffle writer spill can be triggered. 2. If so, could you also try dumping the jemalloc stats when `VeloxUniffleColumnarShuffleWriter.spill` is called? It may help to see which parts hold so much overhead memory. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
