[ 
https://issues.apache.org/jira/browse/SPARK-34389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17282871#comment-17282871
 ] 

Attila Zsolt Piros commented on SPARK-34389:
--------------------------------------------

@Ranju from the logs you attached it is clear here all the POD requests are 
processed and they reached the PENDING state, check this part from [^Steps to 
reproduce.docx]:

 
{noformat}
exportdata-66af7377812a3b0c-exec-1      1/1     Running       0              
20s              <host-ip-1>                 <host-name-1>   <none>           
<none>
exportdata-66af7377812a3b0c-exec-2      0/1     Pending       0              
19s              <none>                         <none>         <none>           
<none
exportdata-66af7377812a3b0c-exec-3      0/1     Pending       0             19s 
              <none>                          <none>         <none>           
<none>
{noformat}
>From such on overloaded k8s cluster, where the available memory is 12GB and 
>executors with 10GB is requested (and 1 already allocated so the available 
>memory is now 2GB) there is no chance to allocate more. So things just works 
>as expected.

You have three options: increasing the cluster size, decreasing the load on the 
cluster, requesting executors with less memory.

> Spark job on Kubernetes scheduled For Zero or less than minimum number of 
> executors and Wait indefinitely under resource starvation
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-34389
>                 URL: https://issues.apache.org/jira/browse/SPARK-34389
>             Project: Spark
>          Issue Type: Bug
>          Components: Kubernetes
>    Affects Versions: 3.0.1
>            Reporter: Ranju
>            Priority: Major
>         Attachments: DriverLogs_ExecutorLaunchedLessThanMinExecutor.txt, 
> Steps to reproduce.docx
>
>
> In case Cluster does not have sufficient resource (CPU/ Memory ) for minimum 
> number of executors , the executors goes in Pending State for indefinite time 
> until the resource gets free.
> Suppose, Cluster Configurations are:
> total Memory=204Gi
> used Memory=200Gi
> free memory= 4Gi
> SPARK.EXECUTOR.MEMORY=10G
> SPARK.DYNAMICALLOCTION.MINEXECUTORS=4
> SPARK.DYNAMICALLOCATION.MAXEXECUTORS=8
> Rather, the job should be cancelled if requested number of minimum executors 
> are not available at that point of time because of resource unavailability.
> Currently it is doing partial scheduling or no scheduling and waiting 
> indefinitely. And the job got stuck.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to