[
https://issues.apache.org/jira/browse/SPARK-20589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16023769#comment-16023769
]
Mark Nelson commented on SPARK-20589:
-------------------------------------
I would find this very useful. We're currently using coalesce to limit the
simultaneous tasks in a stage that is querying Cassandra. But this gives us
huge partitions. If a query timeout causes a task to fail we can lose a lot of
work. And the chances of an individual task failing 4 times is high, killing
the entire job. Ideally I would like the stage to have a large number of
partitions, but limit the number of simultaneous tasks for this one stage.
> Allow limiting task concurrency per stage
> -----------------------------------------
>
> Key: SPARK-20589
> URL: https://issues.apache.org/jira/browse/SPARK-20589
> Project: Spark
> Issue Type: Improvement
> Components: Scheduler
> Affects Versions: 2.1.0
> Reporter: Thomas Graves
>
> It would be nice to have the ability to limit the number of concurrent tasks
> per stage. This is useful when your spark job might be accessing another
> service and you don't want to DOS that service. For instance Spark writing
> to hbase or Spark doing http puts on a service. Many times you want to do
> this without limiting the number of partitions.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]