Maxim Gekk created SPARK-24591:
----------------------------------
Summary: Number of cores and executors in the cluster
Key: SPARK-24591
URL: https://issues.apache.org/jira/browse/SPARK-24591
Project: Spark
Issue Type: Improvement
Components: Spark Core
Affects Versions: 2.3.1
Reporter: Maxim Gekk
Need to add 2 new methods. The first one should return total number of CPU
cores of all executors in the cluster. The second one should give current
number of executors registered in the cluster.
Main motivations for adding of those methods:
1. It is the best practice to manage job parallelism relative to available
cores, e.g., df.repartition(5 * sc.coreCount) . In particular, it is an
anti-pattern to leave a bunch of cores on large clusters twiddling their thumb
& doing nothing. Usually users pass predefined constants for _repartition()_
and _coalesce()_. Selection of the constant is based on current cluster size.
If the code runs on another cluster and/or on the resized cluster, they need to
modify the constant each time. This happens frequently when a job that normally
runs on, say, an hour of data on a small cluster needs to run on a week of data
on a much larger cluster.
2. *spark.default.parallelism* can be used to get total number of cores in the
cluster but it can be redefined by user. The info can be taken via registration
of a listener but repeating the same looks ugly. We should follow the DRY
principle.
3. Regarding to executorsCount(), some jobs, e.g., local node ML training, use
a lot of parallelism. It's a common practice to aim to distribute such jobs
such that there is one partition for each executor.
4. In some places users collect this info, as well as other settings info
together with job timing (at the app level) for analysis. E.g., you can use ML
to determine optimal cluster size given different objectives, e.g., fastest
throughput vs. lowest cost per unit of processing.
5. The simpler argument is that basic cluster properties should be easily
discoverable via APIs.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]