FelixYBW commented on issue #8018: URL: https://github.com/apache/incubator-gluten/issues/8018#issuecomment-2499253582
> Moreover, does the feature target more for batch query scenarios (ETL, nightly, etc.)? Since I remember changing a Spark resource profile usually causes rebooting of executors, which will cause larger latency on ad-hoc queries? > There are some talk previously with Pinterest team. We have two ways to do this: 1. The normal way, we initially start executors with large offheap memory, when a task needs large on heap memory, the driver needs to kill current executor (including all other tasks running on current executor but different task thread) and restart a new executor with large onheap memory to run the task. So the total executor number and resource is as configured. We may needn't hack Spark in this way. 2. we start 2 executors but share the same memory resource, one for large offheap, one for large onheap. At any time, task scheduler either schedule task to offheap or onheap executors. In this way we can make sure source isn't overcommitted. It can avoid the frequent restart of the offheap/onheap executor. 3. like 1, but spark.dynamicAllocation.maxExecutors is enabled which I'd expect most customer does, we can start new executors with large onheap memory if resource is still available. It may leads to a situation that new executor has no oppotunity to start. To creat POC, we may start from 2 (with 1/2 executor.instances for offheap and 1/2 for onheap) or 3. You thought? To estimate the offheap/onheap ratio, we can start with a configurable value like 8:2 for offheap vs: 0:10 for onheap. Another thing to consideration is that Vanilla spark also support offheap memory, but it still needs large onheap memory, I didn't find any guide line how to config this. Either not sure if Spark community still are working to all large memory allocation from onheap to offheap. If one day all large memory allocation for vanilla spark can be allocated in offheap, we don't have such issue, but the new issue how Gluten and spark share offheap memory which isn't fully solved today. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
