[ 
https://issues.apache.org/jira/browse/SPARK-20589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16023769#comment-16023769
 ] 

Mark Nelson commented on SPARK-20589:
-------------------------------------

I would find this very useful.  We're currently using coalesce to limit the 
simultaneous tasks in a stage that is querying Cassandra. But this gives us 
huge partitions.  If a query timeout causes a task to fail we can lose a lot of 
work.  And the chances of an individual task failing 4 times is high, killing 
the entire job.  Ideally I would like the stage to have a large number of 
partitions, but limit the number of simultaneous tasks for this one stage.

> Allow limiting task concurrency per stage
> -----------------------------------------
>
>                 Key: SPARK-20589
>                 URL: https://issues.apache.org/jira/browse/SPARK-20589
>             Project: Spark
>          Issue Type: Improvement
>          Components: Scheduler
>    Affects Versions: 2.1.0
>            Reporter: Thomas Graves
>
> It would be nice to have the ability to limit the number of concurrent tasks 
> per stage.  This is useful when your spark job might be accessing another 
> service and you don't want to DOS that service.  For instance Spark writing 
> to hbase or Spark doing http puts on a service.  Many times you want to do 
> this without limiting the number of partitions. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to