[ 
https://issues.apache.org/jira/browse/SPARK-33288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17222932#comment-17222932
 ] 

Thomas Graves commented on SPARK-33288:
---------------------------------------

yes. Really when I say hanging it means k8s won't be able to give you an 
executor to match the resources.

The same thing can happen right now if my cluster doesn't have the resources I 
request.  Lets say I have a 1 node k8s cluster each with 24 cores. If I ask for 
executor with 64 cores spark hangs waiting to get that executor and k8s will 
never be able to give it to you.

Its just in the case of stage level scheduling, the reason you might not get an 
executor would be because other executors int he same application are still 
running because they have shuffle data on them. And like you say if you don't 
set the timeout (defaults to infinity) it will "hang".

> Support k8s cluster manager with stage level scheduling
> -------------------------------------------------------
>
>                 Key: SPARK-33288
>                 URL: https://issues.apache.org/jira/browse/SPARK-33288
>             Project: Spark
>          Issue Type: New Feature
>          Components: Kubernetes, Spark Core
>    Affects Versions: 3.1.0
>            Reporter: Thomas Graves
>            Priority: Major
>
> Kubernetes supports dynamic allocation via the 
> {{spark.dynamicAllocation.shuffleTracking.enabled}}
> {{config, we can add support for stage level scheduling when that is turned 
> on.  }}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to