FelixYBW commented on issue #4392: URL: https://github.com/apache/incubator-gluten/issues/4392#issuecomment-3392636857
> IIRC Spark sort-based shuffle is a heavy operator that remains on on-heap when off-heap is on. I would be glad to do the some path-findings to see if we can somehow fix this. [@FelixYBW](https://github.com/FelixYBW) If you want to also help locate where the remaining on-heap consumption came from? E.g., Could try setting `spark.shuffle.sort.bypassMergeThreshold = 2147483647` to disable vanilla sort-based shuffle then see if the on-heap consumption of vanilla Spark can be reduced. Thanks. Not the shuffle. From the workload it's the sort operator. We enabled offheap but config a small offheap memory and large onheap memory, the spill is triggered even onheap memory is far enough for the sort. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
