[jira] [Commented] (SPARK-20589) Allow limiting task concurrency per stage

Imran Rashid (JIRA) Wed, 16 Aug 2017 11:19:36 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-20589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16129198#comment-16129198
 ]


Imran Rashid commented on SPARK-20589:
--------------------------------------

the proposed solution will only let you control concurrency for an entire job, 
so if you want to change concurrency, you're already talking about breaking 
things into multiple jobs.  I know that isn't necessarily as bad as writing and 
reading from hdfs for an entirely new application, but its still not letting 
you do something like read from some data source with 20 concurrent tasks, and 
then immediately go into a longer series of shuffles / joins etc. with 1000s of 
concurrent tasks.  You'd need to add a job boundary in there somewhere.

> Allow limiting task concurrency per stage
> -----------------------------------------
>
>                 Key: SPARK-20589
>                 URL: https://issues.apache.org/jira/browse/SPARK-20589
>             Project: Spark
>          Issue Type: Improvement
>          Components: Scheduler
>    Affects Versions: 2.1.0
>            Reporter: Thomas Graves
>
> It would be nice to have the ability to limit the number of concurrent tasks 
> per stage.  This is useful when your spark job might be accessing another 
> service and you don't want to DOS that service.  For instance Spark writing 
> to hbase or Spark doing http puts on a service.  Many times you want to do 
> this without limiting the number of partitions. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-20589) Allow limiting task concurrency per stage

Reply via email to