[
https://issues.apache.org/jira/browse/SPARK-20589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16129289#comment-16129289
]
Amit Kumar commented on SPARK-20589:
------------------------------------
As you said, adding job boundary via code will be much easier than the overhead
of hdfs serialization/deserialization and writing multiple application submit
workflows.
Furthermore, the use case is for opposite of what you mentioned. It is when we
have a 1000 concurrent tasks and from there we want to go to 20 concurrent
tasks to write somewhere, without
* Forcing a coalesce to 20 partitions which could cause huge partitions and
possible OOM and shuffle errors
* Affecting the earlier parallelisms, (the 1000 concurrent tasks etc)
The thing is, that we don't want to reduce the number of partitions as it
starts affecting either earlier tasks or cause huge partitions. But for some
stages in the pipeline we want to limit the number active tasks at any given
time. Adding the boundary via simple code, as proposed by [~Dhruve Ashar]
seems much more simpler solutions than breaking the pipeline into different
stages and running each with different configs. We do have to wait for his
complete solution to pass judgement for whether or not it's too complex, but if
he can achieve the result, I would think it will be more beneficial for the
community
> Allow limiting task concurrency per stage
> -----------------------------------------
>
> Key: SPARK-20589
> URL: https://issues.apache.org/jira/browse/SPARK-20589
> Project: Spark
> Issue Type: Improvement
> Components: Scheduler
> Affects Versions: 2.1.0
> Reporter: Thomas Graves
>
> It would be nice to have the ability to limit the number of concurrent tasks
> per stage. This is useful when your spark job might be accessing another
> service and you don't want to DOS that service. For instance Spark writing
> to hbase or Spark doing http puts on a service. Many times you want to do
> this without limiting the number of partitions.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]