Re: [I] Implement stage-level resourceProfile auto-adjust framework to avoid oom [incubator-gluten]

via GitHub Fri, 22 Nov 2024 17:40:14 -0800


FelixYBW commented on issue #8018:
URL: 
https://github.com/apache/incubator-gluten/issues/8018#issuecomment-2495191927


   @zjuwangg Thank you for your investigation! It's really something we'd like 
to do.
   
   - We also should consider about the collaboration with RAS. 
   - We need to predefine some operators' potential memory usage like Scan or 
Project in velox consumes little memory, but aggregate and join need much. So 
if a scan + fallback aggregate, we are able to set small offheap + large on 
heap. If it's a offloaded agg + fallbacked join, we now needs to set large 
offheap + large on heap, in this way we should fallback the agg or even whole 
stage then set a large on heap memory.
   - It's even better if we can specify different fallback policy when a task 
is retried, which means some task may offload to Velox, some task may retry 
with fallback. In theory it's possible but more complex.
   
   @PHILO-HE has done some investigation some time ago and noted some code 
changes in Vanilla Spark is necessary, did you noted it? if so we may hack the 
code in Gluten firstly then submit PR to upstream Spark.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Implement stage-level resourceProfile auto-adjust framework to avoid oom [incubator-gluten]

Reply via email to