Re: question on SPARK_WORKER_CORES

2017-02-18 Thread Yan Facai
Hi, kodali.

SPARK_WORKER_CORES is designed for cluster resource manager, see
http://spark.apache.org/docs/latest/cluster-overview.html if interested.

For standalone mode,
you should use the following 3 arguments to allocate resource for normal
spark tasks:

   - --executor-memory
   - --executor-cores
   - --total-executor-cores

and the meaning is as below:

   - Executor memory: --executor-memory
   - Executor cores: --executor-cores
   - Number of executors: --total-executor-cores/--executor-cores

more details see
http://spark.apache.org/docs/latest/submitting-applications.html.




On Sat, Feb 18, 2017 at 9:20 AM, kant kodali <kanth...@gmail.com> wrote:

> Hi Satish,
>
> I am using spark 2.0.2.  And no I have not passed those variables because
> I didn't want to shoot in the dark. According to the documentation it looks
> like SPARK_WORKER_CORES is the one which should do it. If not, can you
> please explain how these variables inter play together?
>
> --num-executors
> --executor-cores
> –total-executor-cores
> SPARK_WORKER_CORES
>
> Thanks!
>
>
> On Fri, Feb 17, 2017 at 5:13 PM, Satish Lalam <sati...@microsoft.com>
> wrote:
>
>> Have you tried passing --executor-cores or –total-executor-cores as
>> arguments, , depending on the spark version?
>>
>>
>>
>>
>>
>> *From:* kant kodali [mailto:kanth...@gmail.com]
>> *Sent:* Friday, February 17, 2017 5:03 PM
>> *To:* Alex Kozlov <ale...@gmail.com>
>> *Cc:* user @spark <user@spark.apache.org>
>> *Subject:* Re: question on SPARK_WORKER_CORES
>>
>>
>>
>> Standalone.
>>
>>
>>
>> On Fri, Feb 17, 2017 at 5:01 PM, Alex Kozlov <ale...@gmail.com> wrote:
>>
>> What Spark mode are you running the program in?
>>
>>
>>
>> On Fri, Feb 17, 2017 at 4:55 PM, kant kodali <kanth...@gmail.com> wrote:
>>
>> when I submit a job using spark shell I get something like this
>>
>>
>>
>> [Stage 0:>(36814 + 4) / 220129]
>>
>>
>>
>> Now all I want is I want to increase number of parallel tasks running
>> from 4 to 16 so I exported an env variable called SPARK_WORKER_CORES=16 in
>> conf/spark-env.sh. I though that should do it but it doesn't. It still
>> shows me 4. any idea?
>>
>>
>>
>> Thanks much!
>>
>>
>>
>>
>>
>> --
>>
>> Alex Kozlov
>> (408) 507-4987
>> (650) 887-2135 efax
>> ale...@gmail.com
>>
>>
>>
>
>


Re: question on SPARK_WORKER_CORES

2017-02-17 Thread kant kodali
one executor per Spark slave should be fine right I am not really sure what
benefit one would get by starting more executors (jvm's) on one node? End
of the day JVM creates native/kernel threads through system calls so if
those threads are spawned by one or multiple processes I dont see much
benefit (In theory it should be the same). With different processes one
would get different address spaces in the kernel but memory isn't an issue
so far.

On Fri, Feb 17, 2017 at 5:32 PM, Alex Kozlov <ale...@gmail.com> wrote:

> I found in some previous CDH versions that Spark starts only one executor
> per Spark slave, and DECREASING the executor-cores in standalone makes
> the total # of executors go up.  Just my 2¢.
>
> On Fri, Feb 17, 2017 at 5:20 PM, kant kodali <kanth...@gmail.com> wrote:
>
>> Hi Satish,
>>
>> I am using spark 2.0.2.  And no I have not passed those variables because
>> I didn't want to shoot in the dark. According to the documentation it looks
>> like SPARK_WORKER_CORES is the one which should do it. If not, can you
>> please explain how these variables inter play together?
>>
>> --num-executors
>> --executor-cores
>> –total-executor-cores
>> SPARK_WORKER_CORES
>>
>> Thanks!
>>
>>
>> On Fri, Feb 17, 2017 at 5:13 PM, Satish Lalam <sati...@microsoft.com>
>> wrote:
>>
>>> Have you tried passing --executor-cores or –total-executor-cores as
>>> arguments, , depending on the spark version?
>>>
>>>
>>>
>>>
>>>
>>> *From:* kant kodali [mailto:kanth...@gmail.com]
>>> *Sent:* Friday, February 17, 2017 5:03 PM
>>> *To:* Alex Kozlov <ale...@gmail.com>
>>> *Cc:* user @spark <user@spark.apache.org>
>>> *Subject:* Re: question on SPARK_WORKER_CORES
>>>
>>>
>>>
>>> Standalone.
>>>
>>>
>>>
>>> On Fri, Feb 17, 2017 at 5:01 PM, Alex Kozlov <ale...@gmail.com> wrote:
>>>
>>> What Spark mode are you running the program in?
>>>
>>>
>>>
>>> On Fri, Feb 17, 2017 at 4:55 PM, kant kodali <kanth...@gmail.com> wrote:
>>>
>>> when I submit a job using spark shell I get something like this
>>>
>>>
>>>
>>> [Stage 0:>(36814 + 4) / 220129]
>>>
>>>
>>>
>>> Now all I want is I want to increase number of parallel tasks running
>>> from 4 to 16 so I exported an env variable called SPARK_WORKER_CORES=16 in
>>> conf/spark-env.sh. I though that should do it but it doesn't. It still
>>> shows me 4. any idea?
>>>
>>>
>>>
>>> Thanks much!
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Alex Kozlov
>>> (408) 507-4987
>>> (650) 887-2135 efax
>>> ale...@gmail.com
>>>
>>>
>>>
>>
>>
>
>
> --
> Alex Kozlov
> (408) 507-4987
> (650) 887-2135 efax
> ale...@gmail.com
>


Re: question on SPARK_WORKER_CORES

2017-02-17 Thread Alex Kozlov
I found in some previous CDH versions that Spark starts only one executor
per Spark slave, and DECREASING the executor-cores in standalone makes the
total # of executors go up.  Just my 2¢.

On Fri, Feb 17, 2017 at 5:20 PM, kant kodali <kanth...@gmail.com> wrote:

> Hi Satish,
>
> I am using spark 2.0.2.  And no I have not passed those variables because
> I didn't want to shoot in the dark. According to the documentation it looks
> like SPARK_WORKER_CORES is the one which should do it. If not, can you
> please explain how these variables inter play together?
>
> --num-executors
> --executor-cores
> –total-executor-cores
> SPARK_WORKER_CORES
>
> Thanks!
>
>
> On Fri, Feb 17, 2017 at 5:13 PM, Satish Lalam <sati...@microsoft.com>
> wrote:
>
>> Have you tried passing --executor-cores or –total-executor-cores as
>> arguments, , depending on the spark version?
>>
>>
>>
>>
>>
>> *From:* kant kodali [mailto:kanth...@gmail.com]
>> *Sent:* Friday, February 17, 2017 5:03 PM
>> *To:* Alex Kozlov <ale...@gmail.com>
>> *Cc:* user @spark <user@spark.apache.org>
>> *Subject:* Re: question on SPARK_WORKER_CORES
>>
>>
>>
>> Standalone.
>>
>>
>>
>> On Fri, Feb 17, 2017 at 5:01 PM, Alex Kozlov <ale...@gmail.com> wrote:
>>
>> What Spark mode are you running the program in?
>>
>>
>>
>> On Fri, Feb 17, 2017 at 4:55 PM, kant kodali <kanth...@gmail.com> wrote:
>>
>> when I submit a job using spark shell I get something like this
>>
>>
>>
>> [Stage 0:>(36814 + 4) / 220129]
>>
>>
>>
>> Now all I want is I want to increase number of parallel tasks running
>> from 4 to 16 so I exported an env variable called SPARK_WORKER_CORES=16 in
>> conf/spark-env.sh. I though that should do it but it doesn't. It still
>> shows me 4. any idea?
>>
>>
>>
>> Thanks much!
>>
>>
>>
>>
>>
>> --
>>
>> Alex Kozlov
>> (408) 507-4987
>> (650) 887-2135 efax
>> ale...@gmail.com
>>
>>
>>
>
>


-- 
Alex Kozlov
(408) 507-4987
(650) 887-2135 efax
ale...@gmail.com


Re: question on SPARK_WORKER_CORES

2017-02-17 Thread kant kodali
Hi Satish,

I am using spark 2.0.2.  And no I have not passed those variables because I
didn't want to shoot in the dark. According to the documentation it looks
like SPARK_WORKER_CORES is the one which should do it. If not, can you
please explain how these variables inter play together?

--num-executors
--executor-cores
–total-executor-cores
SPARK_WORKER_CORES

Thanks!


On Fri, Feb 17, 2017 at 5:13 PM, Satish Lalam <sati...@microsoft.com> wrote:

> Have you tried passing --executor-cores or –total-executor-cores as
> arguments, , depending on the spark version?
>
>
>
>
>
> *From:* kant kodali [mailto:kanth...@gmail.com]
> *Sent:* Friday, February 17, 2017 5:03 PM
> *To:* Alex Kozlov <ale...@gmail.com>
> *Cc:* user @spark <user@spark.apache.org>
> *Subject:* Re: question on SPARK_WORKER_CORES
>
>
>
> Standalone.
>
>
>
> On Fri, Feb 17, 2017 at 5:01 PM, Alex Kozlov <ale...@gmail.com> wrote:
>
> What Spark mode are you running the program in?
>
>
>
> On Fri, Feb 17, 2017 at 4:55 PM, kant kodali <kanth...@gmail.com> wrote:
>
> when I submit a job using spark shell I get something like this
>
>
>
> [Stage 0:>(36814 + 4) / 220129]
>
>
>
> Now all I want is I want to increase number of parallel tasks running from
> 4 to 16 so I exported an env variable called SPARK_WORKER_CORES=16 in
> conf/spark-env.sh. I though that should do it but it doesn't. It still
> shows me 4. any idea?
>
>
>
> Thanks much!
>
>
>
>
>
> --
>
> Alex Kozlov
> (408) 507-4987
> (650) 887-2135 efax
> ale...@gmail.com
>
>
>


RE: question on SPARK_WORKER_CORES

2017-02-17 Thread Satish Lalam
Have you tried passing --executor-cores or –total-executor-cores as arguments, 
, depending on the spark version?


From: kant kodali [mailto:kanth...@gmail.com]
Sent: Friday, February 17, 2017 5:03 PM
To: Alex Kozlov <ale...@gmail.com>
Cc: user @spark <user@spark.apache.org>
Subject: Re: question on SPARK_WORKER_CORES

Standalone.

On Fri, Feb 17, 2017 at 5:01 PM, Alex Kozlov 
<ale...@gmail.com<mailto:ale...@gmail.com>> wrote:
What Spark mode are you running the program in?

On Fri, Feb 17, 2017 at 4:55 PM, kant kodali 
<kanth...@gmail.com<mailto:kanth...@gmail.com>> wrote:
when I submit a job using spark shell I get something like this


[Stage 0:>(36814 + 4) / 220129]



Now all I want is I want to increase number of parallel tasks running from 4 to 
16 so I exported an env variable called SPARK_WORKER_CORES=16 in 
conf/spark-env.sh. I though that should do it but it doesn't. It still shows me 
4. any idea?



Thanks much!




--
Alex Kozlov
(408) 507-4987<tel:(408)%20507-4987>
(650) 887-2135<tel:(650)%20887-2135> efax
ale...@gmail.com<mailto:ale...@gmail.com>



Re: question on SPARK_WORKER_CORES

2017-02-17 Thread kant kodali
Standalone.

On Fri, Feb 17, 2017 at 5:01 PM, Alex Kozlov  wrote:

> What Spark mode are you running the program in?
>
> On Fri, Feb 17, 2017 at 4:55 PM, kant kodali  wrote:
>
>> when I submit a job using spark shell I get something like this
>>
>> [Stage 0:>(36814 + 4) / 220129]
>>
>>
>> Now all I want is I want to increase number of parallel tasks running
>> from 4 to 16 so I exported an env variable called SPARK_WORKER_CORES=16 in
>> conf/spark-env.sh. I though that should do it but it doesn't. It still
>> shows me 4. any idea?
>>
>>
>> Thanks much!
>>
>>
>>
>
>
> --
> Alex Kozlov
> (408) 507-4987
> (650) 887-2135 efax
> ale...@gmail.com
>


Re: question on SPARK_WORKER_CORES

2017-02-17 Thread Alex Kozlov
What Spark mode are you running the program in?

On Fri, Feb 17, 2017 at 4:55 PM, kant kodali  wrote:

> when I submit a job using spark shell I get something like this
>
> [Stage 0:>(36814 + 4) / 220129]
>
>
> Now all I want is I want to increase number of parallel tasks running from
> 4 to 16 so I exported an env variable called SPARK_WORKER_CORES=16 in
> conf/spark-env.sh. I though that should do it but it doesn't. It still
> shows me 4. any idea?
>
>
> Thanks much!
>
>
>


-- 
Alex Kozlov
(408) 507-4987
(650) 887-2135 efax
ale...@gmail.com


RE: Question about SPARK_WORKER_CORES and spark.task.cpus

2015-06-22 Thread Cheng, Hao
It’s actually not that tricky.
SPARK_WORKER_CORES: is the max task thread pool size of the of the executor, 
the same saying of “one executor with 32 cores and the executor could execute 
32 tasks simultaneously”. Spark doesn’t care about how much real physical 
CPU/Cores you have (OS does), so user need to give an appropriate value to 
reflect the real physical machine settings, otherwise the thread context 
switching probably be an overhead for the CPU intensive tasks.

“spark.task.cpus”: I copied how to it’s used from the Spark source code:

  // TODO: The default value of 1 for spark.executor.cores works right now 
because dynamic
  // allocation is only supported for YARN and the default number of cores per 
executor in YARN is
  // 1, but it might need to be attained differently for different cluster 
managers
  private val tasksPerExecutor =
conf.getInt(spark.executor.cores, 1) / conf.getInt(spark.task.cpus, 1)

It means the “Number of Tasks per Executor”(parallelize task number per 
executor) = SPARK_WORKER_CORES / “spark.task.cpus”

“spark.task.cpus” gives user an opportunity to reserve resources for a task 
which probably create more running threads internally. (For example, run a 
multithreaded external app within each task).

Hope it helpful.


From: Rui Li [mailto:spark.ru...@gmail.com]
Sent: Tuesday, June 23, 2015 8:56 AM
To: user@spark.apache.org
Subject: Question about SPARK_WORKER_CORES and spark.task.cpus

Hi,

I was running a WordCount application on Spark, and the machine I used has 4 
physical cores. However, in spark-env.sh file, I set  SPARK_WORKER_CORES = 32. 
The web UI says it launched one executor with 32 cores and the executor could 
execute 32 tasks simultaneously. Does spark create 32 vCores out of 4 physical 
cores? How much physical CPU resource can each task get then?

Also, I found a parameter “spark.task.cpus”, but I don’t quite understand this 
parameter. If I set it to 2, does Spark allocate 2 CPU cores for one task? I 
think “task” is a “thread” within executor (“process”), so how can a thread 
utilize two CPU cores simultaneously?

I am looking forward to your reply, thanks!

Best,
Rui