[
https://issues.apache.org/jira/browse/SPARK-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14172861#comment-14172861
]
Marcelo Vanzin commented on SPARK-3174:
---------------------------------------
[~praveenseluka] I took a quick look at your design. I have a few coments:
* It ignores the shuffle, which is coverend in Andrew's design. You can't scale
down without figuring out what to do with the shuffle data of completed states,
since it may be reused. It's not just about cached blocks.
* Overall, I like the idea of having "pluggable" scaling algorithms, but I
think it's too early for a public add/remove executor API. For example,
standalone mode doesn't support adding or removing executors, AFAIK.
* The "task runtime"-based heuristic looks a bit weird. I think it needs to be
more dynamic - e.g., actually consider the backlog of tasks vs. number of tasks
that can be run concurrently on the current executors. Also, "doubling number
of executors" seems a bit like a sledgehammer - an exponential approach like
Andrew has proposed might be able to reach similar results with lower overall
system load.
I haven't looked at Andrew's PR yet (it's on my todo list), but I don't see a
lot that is that much different in the two proposals. I agree that if it's
possible it would be nice to keep the autoscaling code / interfaces as separate
as possible from the current core, if only to make them easier to replace /
refactor / disable, but then again, I haven't looked at Andrew's code yet.
> Provide elastic scaling within a Spark application
> --------------------------------------------------
>
> Key: SPARK-3174
> URL: https://issues.apache.org/jira/browse/SPARK-3174
> Project: Spark
> Issue Type: Improvement
> Components: Spark Core, YARN
> Affects Versions: 1.0.2
> Reporter: Sandy Ryza
> Assignee: Andrew Or
> Attachments: SPARK-3174design.pdf, SparkElasticScalingDesignB.pdf,
> dynamic-scaling-executors-10-6-14.pdf
>
>
> A common complaint with Spark in a multi-tenant environment is that
> applications have a fixed allocation that doesn't grow and shrink with their
> resource needs. We're blocked on YARN-1197 for dynamically changing the
> resources within executors, but we can still allocate and discard whole
> executors.
> It would be useful to have some heuristics that
> * Request more executors when many pending tasks are building up
> * Discard executors when they are idle
> See the latest design doc for more information.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]