Yohahaha commented on issue #8018: URL: https://github.com/apache/incubator-gluten/issues/8018#issuecomment-2495794197
thank you for proposing this great idea and glad to see the POC has gain benefits in your prod env! > Meets the stage level resource conditions > 1. executor dynamic allocation is enabled, spark.dynamicAllocation.enabled must be true > 2. Underlying resource schduler must support dynamic allocate executor for me, the most interesting things is the DRA(dynamic resource allocation) must be enabled, I guess the reason is to change executor's memory settings after we found OOM occurs, otherwise, new executor/pod will still OOM then dead, lead to spark job failed finally. I found Uber has proposed a idea to solve pure on-heap OOM, it may helps understand more context about the reason for above requirement of DRA. https://www.uber.com/en-JP/blog/dynamic-executor-core-resizing-in-spark/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
