[
https://issues.apache.org/jira/browse/SPARK-34389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17282871#comment-17282871
]
Attila Zsolt Piros commented on SPARK-34389:
--------------------------------------------
@Ranju from the logs you attached it is clear here all the POD requests are
processed and they reached the PENDING state, check this part from [^Steps to
reproduce.docx]:
{noformat}
exportdata-66af7377812a3b0c-exec-1 1/1 Running 0
20s <host-ip-1> <host-name-1> <none>
<none>
exportdata-66af7377812a3b0c-exec-2 0/1 Pending 0
19s <none> <none> <none>
<none
exportdata-66af7377812a3b0c-exec-3 0/1 Pending 0 19s
<none> <none> <none>
<none>
{noformat}
>From such on overloaded k8s cluster, where the available memory is 12GB and
>executors with 10GB is requested (and 1 already allocated so the available
>memory is now 2GB) there is no chance to allocate more. So things just works
>as expected.
You have three options: increasing the cluster size, decreasing the load on the
cluster, requesting executors with less memory.
> Spark job on Kubernetes scheduled For Zero or less than minimum number of
> executors and Wait indefinitely under resource starvation
> -----------------------------------------------------------------------------------------------------------------------------------
>
> Key: SPARK-34389
> URL: https://issues.apache.org/jira/browse/SPARK-34389
> Project: Spark
> Issue Type: Bug
> Components: Kubernetes
> Affects Versions: 3.0.1
> Reporter: Ranju
> Priority: Major
> Attachments: DriverLogs_ExecutorLaunchedLessThanMinExecutor.txt,
> Steps to reproduce.docx
>
>
> In case Cluster does not have sufficient resource (CPU/ Memory ) for minimum
> number of executors , the executors goes in Pending State for indefinite time
> until the resource gets free.
> Suppose, Cluster Configurations are:
> total Memory=204Gi
> used Memory=200Gi
> free memory= 4Gi
> SPARK.EXECUTOR.MEMORY=10G
> SPARK.DYNAMICALLOCTION.MINEXECUTORS=4
> SPARK.DYNAMICALLOCATION.MAXEXECUTORS=8
> Rather, the job should be cancelled if requested number of minimum executors
> are not available at that point of time because of resource unavailability.
> Currently it is doing partial scheduling or no scheduling and waiting
> indefinitely. And the job got stuck.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]