[
https://issues.apache.org/jira/browse/SPARK-21122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16209918#comment-16209918
]
Craig Ingram commented on SPARK-21122:
--------------------------------------
Finally getting back around to this. Thanks for the feedback [~tgraves]. I
agree with pretty much everything you pointed out. Newer versions of YARN do
have a [Cluster Metrics
API|https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Metrics_API].
I think this was introduced in 2.8 though. Regardless I believe working with
YARN's PreemptionMessage is a better path forward than what I was initially
proposing. I wish there was an elegant way to do this generically, but I
believe I can at least make an abstraction of the PreemptionMessage that can be
used by other RM clients. For now, I will focus on a YARN specific solution.
> Address starvation issues when dynamic allocation is enabled
> ------------------------------------------------------------
>
> Key: SPARK-21122
> URL: https://issues.apache.org/jira/browse/SPARK-21122
> Project: Spark
> Issue Type: Improvement
> Components: Spark Core, YARN
> Affects Versions: 2.2.0, 2.3.0
> Reporter: Craig Ingram
> Attachments: Preventing Starvation with Dynamic Allocation Enabled.pdf
>
>
> When dynamic resource allocation is enabled on a cluster, it’s currently
> possible for one application to consume all the cluster’s resources,
> effectively starving any other application trying to start. This is
> particularly painful in a notebook environment where notebooks may be idle
> for tens of minutes while the user is figuring out what to do next (or eating
> their lunch). Ideally the application should give resources back to the
> cluster when monitoring indicates other applications are pending.
> Before delving into the specifics of the solution. There are some workarounds
> to this problem that are worth mentioning:
> * Set spark.dynamicAllocation.maxExecutors to a small value, so that users
> are unlikely to use the entire cluster even when many of them are doing work.
> This approach will hurt cluster utilization.
> * If using YARN, enable preemption and have each application (or
> organization) run in a separate queue. The downside of this is that when YARN
> preempts, it doesn't know anything about which executor it's killing. It
> would just as likely kill a long running executor with cached data as one
> that just spun up. Moreover, given a feature like
> https://issues.apache.org/jira/browse/SPARK-21097 (preserving cached data on
> executor decommission), YARN may not wait long enough between trying to
> gracefully and forcefully shut down the executor. This would mean the blocks
> that belonged to that executor would be lost and have to be recomputed.
> * Configure YARN to use the capacity scheduler with multiple scheduler
> queues. Put high-priority notebook users into a high-priority queue. Prevents
> high-priority users from being starved out by low-priority notebook users.
> Does not prevent users in the same priority class from starving each other.
> Obviously any solution to this problem that depends on YARN would leave other
> resource managers out in the cold. The solution proposed in this ticket will
> afford spark clusters the flexibly to hook in different resource allocation
> policies to fulfill their user's needs regardless of resource manager choice.
> Initially the focus will be on users in a notebook environment. When
> operating in a notebook environment with many users, the goal is fair
> resource allocation. Given that all users will be using the same memory
> configuration, this solution will focus primarily on fair sharing of cores.
> The fair resource allocation policy should pick executors to remove based on
> three factors initially: idleness, presence of cached data, and uptime. The
> policy will favor removing executors that are idle, short-lived, and have no
> cached data. The policy will only preemptively remove executors if there are
> pending applications or cores (otherwise the default dynamic allocation
> timeout/removal process is followed). The policy will also allow an
> application's resource consumption to expand based on cluster utilization.
> For example if there are 3 applications running but 2 of them are idle, the
> policy will allow a busy application with pending tasks to consume more than
> 1/3rd of the the cluster's resources.
> More complexity could be added to take advantage of task/stage metrics,
> histograms, and heuristics (i.e. favor removing executors running tasks that
> are quick). The important thing here is to benchmark effectively before
> adding complexity so we can measure the impact of the changes.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]