Adam Kennedy created SPARK-25889:
------------------------------------
Summary: Dynamic allocation load-aware ramp up
Key: SPARK-25889
URL: https://issues.apache.org/jira/browse/SPARK-25889
Project: Spark
Issue Type: New Feature
Components: Scheduler, YARN
Affects Versions: 2.3.2
Reporter: Adam Kennedy
The time based exponential ramp up behavior for dynamic allocation is naive and
destructive, making it very difficult to run very large jobs.
On a large (64,000 core) YARN cluster with a high number of input partitions
(200,000+) the default dynamic allocation approach of requesting containers in
waves, doubling exponentially once per second, results in 50% of the entire
cluster being requested in the final 1 second wave.
This can easily overwhelm RPC processing, or cause expensive Executor startup
steps to break systems. With the interval so short, many additional containers
may be requested beyond what is actually needed and then complete very little
work before sitting around waiting to be deallocated.
Delaying the time between these fixed doublings only has limited impact.
Setting double intervals to once per minute would result in a very slow ramp up
speed, at the end of which we still face large potentially crippling waves of
executor startup.
An alternative approach to spooling up large job appears to be needed, which is
still relatively simple but could be more adaptable to different cluster sizes
and differing cluster and job performance.
I would like to propose a few different approaches based around the general
idea of controlling outstanding requests for new containers based on the number
of executors that are currently running, for some definition of "running".
One example might be to limit requests to one new executor for every existing
executor that currently has an active task. Or some ratio of that, to allow for
more or less aggressive spool up. A lower number would let us approximate
something like fibonacci ramp up, a higher number of say 2x would spool up
quickly, but still aligned with the rate at which broadcast blocks can be
easily distributed to new members.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]