[ 
https://issues.apache.org/jira/browse/SPARK-23825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16419857#comment-16419857
 ] 

David Vogelbacher commented on SPARK-23825:
-------------------------------------------

Will make a PR shortly, cc [~mcheah]

> [K8s] Spark pods should request memory + memoryOverhead as resources
> --------------------------------------------------------------------
>
>                 Key: SPARK-23825
>                 URL: https://issues.apache.org/jira/browse/SPARK-23825
>             Project: Spark
>          Issue Type: Bug
>          Components: Kubernetes
>    Affects Versions: 2.3.0
>            Reporter: David Vogelbacher
>            Priority: Major
>
> We currently request  {{spark.[driver,executor].memory}} as memory from 
> Kubernetes (e.g., 
> [here|https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/steps/BasicDriverConfigurationStep.scala#L95]).
> The limit is set to {{spark.[driver,executor].memory + 
> spark.kubernetes.[driver,executor].memoryOverhead}}.
> This seems to be using Kubernetes wrong. 
> [How Pods with resource limits are 
> run|https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#how-pods-with-resource-limits-are-run],
>  states:
> {noformat}
> If a Container exceeds its memory request, it is likely that its Pod will be 
> evicted whenever the node runs out of memory.
> {noformat}
> Thus, if a the  spark driver/executor uses {{memory + memoryOverhead}} 
> memory, it can be evicted. While an executor might get restarted (but it 
> would still be very bad performance-wise), the driver would be hard to 
> recover.
> I think spark should be able to run with the requested (and, thus, 
> guaranteed) resources from Kubernetes without being in danger of termination 
> without needing to rely on optional available resources.
> Thus, we shoud request {{memory + memoryOverhead}} memory from Kubernetes 
> (and this should also be the limit).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to