[jira] [Commented] (SPARK-24591) Number of cores and executors in the cluster

Apache Spark (JIRA) Mon, 18 Jun 2018 19:56:56 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-24591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16516595#comment-16516595
 ]


Apache Spark commented on SPARK-24591:
--------------------------------------

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/21589

> Number of cores and executors in the cluster
> --------------------------------------------
>
>                 Key: SPARK-24591
>                 URL: https://issues.apache.org/jira/browse/SPARK-24591
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 2.3.1
>            Reporter: Maxim Gekk
>            Priority: Minor
>
> Need to add 2 new methods. The first one should return total number of CPU 
> cores of all executors in the cluster. The second one should give current 
> number of executors registered in the cluster.
> Main motivations for adding of those methods:
> 1. It is the best practice to manage job parallelism relative to available 
> cores, e.g., df.repartition(5 * sc.coreCount) . In particular, it is an 
> anti-pattern to leave a bunch of cores on large clusters twiddling their 
> thumb & doing nothing. Usually users pass predefined constants for 
> _repartition()_ and _coalesce()_. Selection of the constant is based on 
> current cluster size. If the code runs on another cluster and/or on the 
> resized cluster, they need to modify the constant each time. This happens 
> frequently when a job that normally runs on, say, an hour of data on a small 
> cluster needs to run on a week of data on a much larger cluster.
> 2. *spark.default.parallelism* can be used to get total number of cores in 
> the cluster but it can be redefined by user. The info can be taken via 
> registration of a listener but repeating the same looks ugly. We should 
> follow the DRY principle.
> 3. Regarding to executorsCount(), some jobs, e.g., local node ML training, 
> use a lot of parallelism. It's a common practice to aim to distribute such 
> jobs such that there is one partition for each executor. 
>  
> 4. In some places users collect this info, as well as other settings info 
> together with job timing (at the app level) for analysis. E.g., you can use 
> ML to determine optimal cluster size given different objectives, e.g., 
> fastest throughput vs. lowest cost per unit of processing.
> 5. The simpler argument is that basic cluster properties should be easily 
> discoverable via APIs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-24591) Number of cores and executors in the cluster

Reply via email to