It’s actually not that tricky.
SPARK_WORKER_CORES: is the max task thread pool size of the of the executor, 
the same saying of “one executor with 32 cores and the executor could execute 
32 tasks simultaneously”. Spark doesn’t care about how much real physical 
CPU/Cores you have (OS does), so user need to give an appropriate value to 
reflect the real physical machine settings, otherwise the thread context 
switching probably be an overhead for the CPU intensive tasks.

“spark.task.cpus”: I copied how to it’s used from the Spark source code:

  // TODO: The default value of 1 for spark.executor.cores works right now 
because dynamic
  // allocation is only supported for YARN and the default number of cores per 
executor in YARN is
  // 1, but it might need to be attained differently for different cluster 
managers
  private val tasksPerExecutor =
conf.getInt("spark.executor.cores", 1) / conf.getInt("spark.task.cpus", 1)

It means the “Number of Tasks per Executor”(parallelize task number per 
executor) = SPARK_WORKER_CORES / “spark.task.cpus”

“spark.task.cpus” gives user an opportunity to reserve resources for a task 
which probably create more running threads internally. (For example, run a 
multithreaded external app within each task).

Hope it helpful.


From: Rui Li [mailto:spark.ru...@gmail.com]
Sent: Tuesday, June 23, 2015 8:56 AM
To: user@spark.apache.org
Subject: Question about SPARK_WORKER_CORES and spark.task.cpus

Hi,

I was running a WordCount application on Spark, and the machine I used has 4 
physical cores. However, in spark-env.sh file, I set  SPARK_WORKER_CORES = 32. 
The web UI says it launched one executor with 32 cores and the executor could 
execute 32 tasks simultaneously. Does spark create 32 vCores out of 4 physical 
cores? How much physical CPU resource can each task get then?

Also, I found a parameter “spark.task.cpus”, but I don’t quite understand this 
parameter. If I set it to 2, does Spark allocate 2 CPU cores for one task? I 
think “task” is a “thread” within executor (“process”), so how can a thread 
utilize two CPU cores simultaneously?

I am looking forward to your reply, thanks!

Best,
Rui

Reply via email to