[ 
https://issues.apache.org/jira/browse/SPARK-30949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044963#comment-17044963
 ] 

Dongjoon Hyun commented on SPARK-30949:
---------------------------------------

Hi, [~onursatici]. Thank you for filing a JIRA and making a PR.
BTW, why do you think this is `Dependency upgrade`?

> Driver cores in kubernetes are coupled with container resources, not 
> spark.driver.cores
> ---------------------------------------------------------------------------------------
>
>                 Key: SPARK-30949
>                 URL: https://issues.apache.org/jira/browse/SPARK-30949
>             Project: Spark
>          Issue Type: Dependency upgrade
>          Components: Kubernetes
>    Affects Versions: 3.0.0
>            Reporter: Onur Satici
>            Priority: Major
>
> Drivers submitted in kubernetes cluster mode set the parallelism of various 
> components like 'RpcEnv', 'MemoryManager', 'BlockManager' from inferring the 
> number of available cores by calling:
> {code:java}
> Runtime.getRuntime().availableProcessors()
> {code}
> By using this, spark applications running on java 8 or older incorrectly get 
> the total number of cores in the host, ignoring the cgroup limits set by 
> kubernetes (https://bugs.openjdk.java.net/browse/JDK-6515172). Java 9 and 
> newer runtimes do not have this problem.
> Orthogonal to this, it is currently not possible to decouple resource limits 
> on the driver container with the amount of parallelism of the various network 
> and memory components listed above.
> My proposal is to use the 'spark.driver.cores' configuration to get the 
> amount of parallelism, like we do for YARN 
> (https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkContext.scala#L2762-L2767).
>  This will enable users to specify 'spark.driver.cores' to set parallelism, 
> and specify 'spark.kubernetes.driver.requests.cores' to limit the resource 
> requests of the driver container. Further, this will remove the need to call 
> 'availableProcessors()', thus the same number of cores will be used for 
> parallelism independent of the java runtime version.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to