Yohahaha commented on issue #8018:
URL: 
https://github.com/apache/incubator-gluten/issues/8018#issuecomment-2495794197

   thank you for proposing this great idea and glad to see the POC has gain 
benefits in your prod env!
   
   > Meets the stage level resource conditions
   > 1. executor dynamic allocation is enabled, spark.dynamicAllocation.enabled 
must be true
   > 2. Underlying resource schduler must support dynamic allocate executor
   
   for me, the most interesting things is the DRA(dynamic resource allocation) 
must be enabled, I guess the reason is to change executor's memory settings 
after we found OOM occurs, otherwise, new executor/pod will still OOM then 
dead, lead to spark job failed finally. 
   
   I found Uber has proposed a idea to solve pure on-heap OOM, it may helps 
understand more context about the reason for above requirement of DRA.
   https://www.uber.com/en-JP/blog/dynamic-executor-core-resizing-in-spark/


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to