Re: question on SPARK_WORKER_CORES
Hi, kodali. SPARK_WORKER_CORES is designed for cluster resource manager, see http://spark.apache.org/docs/latest/cluster-overview.html if interested. For standalone mode, you should use the following 3 arguments to allocate resource for normal spark tasks: - --executor-memory - --executor-cores - --total-executor-cores and the meaning is as below: - Executor memory: --executor-memory - Executor cores: --executor-cores - Number of executors: --total-executor-cores/--executor-cores more details see http://spark.apache.org/docs/latest/submitting-applications.html. On Sat, Feb 18, 2017 at 9:20 AM, kant kodali <kanth...@gmail.com> wrote: > Hi Satish, > > I am using spark 2.0.2. And no I have not passed those variables because > I didn't want to shoot in the dark. According to the documentation it looks > like SPARK_WORKER_CORES is the one which should do it. If not, can you > please explain how these variables inter play together? > > --num-executors > --executor-cores > –total-executor-cores > SPARK_WORKER_CORES > > Thanks! > > > On Fri, Feb 17, 2017 at 5:13 PM, Satish Lalam <sati...@microsoft.com> > wrote: > >> Have you tried passing --executor-cores or –total-executor-cores as >> arguments, , depending on the spark version? >> >> >> >> >> >> *From:* kant kodali [mailto:kanth...@gmail.com] >> *Sent:* Friday, February 17, 2017 5:03 PM >> *To:* Alex Kozlov <ale...@gmail.com> >> *Cc:* user @spark <user@spark.apache.org> >> *Subject:* Re: question on SPARK_WORKER_CORES >> >> >> >> Standalone. >> >> >> >> On Fri, Feb 17, 2017 at 5:01 PM, Alex Kozlov <ale...@gmail.com> wrote: >> >> What Spark mode are you running the program in? >> >> >> >> On Fri, Feb 17, 2017 at 4:55 PM, kant kodali <kanth...@gmail.com> wrote: >> >> when I submit a job using spark shell I get something like this >> >> >> >> [Stage 0:>(36814 + 4) / 220129] >> >> >> >> Now all I want is I want to increase number of parallel tasks running >> from 4 to 16 so I exported an env variable called SPARK_WORKER_CORES=16 in >> conf/spark-env.sh. I though that should do it but it doesn't. It still >> shows me 4. any idea? >> >> >> >> Thanks much! >> >> >> >> >> >> -- >> >> Alex Kozlov >> (408) 507-4987 >> (650) 887-2135 efax >> ale...@gmail.com >> >> >> > >
Re: question on SPARK_WORKER_CORES
one executor per Spark slave should be fine right I am not really sure what benefit one would get by starting more executors (jvm's) on one node? End of the day JVM creates native/kernel threads through system calls so if those threads are spawned by one or multiple processes I dont see much benefit (In theory it should be the same). With different processes one would get different address spaces in the kernel but memory isn't an issue so far. On Fri, Feb 17, 2017 at 5:32 PM, Alex Kozlov <ale...@gmail.com> wrote: > I found in some previous CDH versions that Spark starts only one executor > per Spark slave, and DECREASING the executor-cores in standalone makes > the total # of executors go up. Just my 2¢. > > On Fri, Feb 17, 2017 at 5:20 PM, kant kodali <kanth...@gmail.com> wrote: > >> Hi Satish, >> >> I am using spark 2.0.2. And no I have not passed those variables because >> I didn't want to shoot in the dark. According to the documentation it looks >> like SPARK_WORKER_CORES is the one which should do it. If not, can you >> please explain how these variables inter play together? >> >> --num-executors >> --executor-cores >> –total-executor-cores >> SPARK_WORKER_CORES >> >> Thanks! >> >> >> On Fri, Feb 17, 2017 at 5:13 PM, Satish Lalam <sati...@microsoft.com> >> wrote: >> >>> Have you tried passing --executor-cores or –total-executor-cores as >>> arguments, , depending on the spark version? >>> >>> >>> >>> >>> >>> *From:* kant kodali [mailto:kanth...@gmail.com] >>> *Sent:* Friday, February 17, 2017 5:03 PM >>> *To:* Alex Kozlov <ale...@gmail.com> >>> *Cc:* user @spark <user@spark.apache.org> >>> *Subject:* Re: question on SPARK_WORKER_CORES >>> >>> >>> >>> Standalone. >>> >>> >>> >>> On Fri, Feb 17, 2017 at 5:01 PM, Alex Kozlov <ale...@gmail.com> wrote: >>> >>> What Spark mode are you running the program in? >>> >>> >>> >>> On Fri, Feb 17, 2017 at 4:55 PM, kant kodali <kanth...@gmail.com> wrote: >>> >>> when I submit a job using spark shell I get something like this >>> >>> >>> >>> [Stage 0:>(36814 + 4) / 220129] >>> >>> >>> >>> Now all I want is I want to increase number of parallel tasks running >>> from 4 to 16 so I exported an env variable called SPARK_WORKER_CORES=16 in >>> conf/spark-env.sh. I though that should do it but it doesn't. It still >>> shows me 4. any idea? >>> >>> >>> >>> Thanks much! >>> >>> >>> >>> >>> >>> -- >>> >>> Alex Kozlov >>> (408) 507-4987 >>> (650) 887-2135 efax >>> ale...@gmail.com >>> >>> >>> >> >> > > > -- > Alex Kozlov > (408) 507-4987 > (650) 887-2135 efax > ale...@gmail.com >
Re: question on SPARK_WORKER_CORES
I found in some previous CDH versions that Spark starts only one executor per Spark slave, and DECREASING the executor-cores in standalone makes the total # of executors go up. Just my 2¢. On Fri, Feb 17, 2017 at 5:20 PM, kant kodali <kanth...@gmail.com> wrote: > Hi Satish, > > I am using spark 2.0.2. And no I have not passed those variables because > I didn't want to shoot in the dark. According to the documentation it looks > like SPARK_WORKER_CORES is the one which should do it. If not, can you > please explain how these variables inter play together? > > --num-executors > --executor-cores > –total-executor-cores > SPARK_WORKER_CORES > > Thanks! > > > On Fri, Feb 17, 2017 at 5:13 PM, Satish Lalam <sati...@microsoft.com> > wrote: > >> Have you tried passing --executor-cores or –total-executor-cores as >> arguments, , depending on the spark version? >> >> >> >> >> >> *From:* kant kodali [mailto:kanth...@gmail.com] >> *Sent:* Friday, February 17, 2017 5:03 PM >> *To:* Alex Kozlov <ale...@gmail.com> >> *Cc:* user @spark <user@spark.apache.org> >> *Subject:* Re: question on SPARK_WORKER_CORES >> >> >> >> Standalone. >> >> >> >> On Fri, Feb 17, 2017 at 5:01 PM, Alex Kozlov <ale...@gmail.com> wrote: >> >> What Spark mode are you running the program in? >> >> >> >> On Fri, Feb 17, 2017 at 4:55 PM, kant kodali <kanth...@gmail.com> wrote: >> >> when I submit a job using spark shell I get something like this >> >> >> >> [Stage 0:>(36814 + 4) / 220129] >> >> >> >> Now all I want is I want to increase number of parallel tasks running >> from 4 to 16 so I exported an env variable called SPARK_WORKER_CORES=16 in >> conf/spark-env.sh. I though that should do it but it doesn't. It still >> shows me 4. any idea? >> >> >> >> Thanks much! >> >> >> >> >> >> -- >> >> Alex Kozlov >> (408) 507-4987 >> (650) 887-2135 efax >> ale...@gmail.com >> >> >> > > -- Alex Kozlov (408) 507-4987 (650) 887-2135 efax ale...@gmail.com
Re: question on SPARK_WORKER_CORES
Hi Satish, I am using spark 2.0.2. And no I have not passed those variables because I didn't want to shoot in the dark. According to the documentation it looks like SPARK_WORKER_CORES is the one which should do it. If not, can you please explain how these variables inter play together? --num-executors --executor-cores –total-executor-cores SPARK_WORKER_CORES Thanks! On Fri, Feb 17, 2017 at 5:13 PM, Satish Lalam <sati...@microsoft.com> wrote: > Have you tried passing --executor-cores or –total-executor-cores as > arguments, , depending on the spark version? > > > > > > *From:* kant kodali [mailto:kanth...@gmail.com] > *Sent:* Friday, February 17, 2017 5:03 PM > *To:* Alex Kozlov <ale...@gmail.com> > *Cc:* user @spark <user@spark.apache.org> > *Subject:* Re: question on SPARK_WORKER_CORES > > > > Standalone. > > > > On Fri, Feb 17, 2017 at 5:01 PM, Alex Kozlov <ale...@gmail.com> wrote: > > What Spark mode are you running the program in? > > > > On Fri, Feb 17, 2017 at 4:55 PM, kant kodali <kanth...@gmail.com> wrote: > > when I submit a job using spark shell I get something like this > > > > [Stage 0:>(36814 + 4) / 220129] > > > > Now all I want is I want to increase number of parallel tasks running from > 4 to 16 so I exported an env variable called SPARK_WORKER_CORES=16 in > conf/spark-env.sh. I though that should do it but it doesn't. It still > shows me 4. any idea? > > > > Thanks much! > > > > > > -- > > Alex Kozlov > (408) 507-4987 > (650) 887-2135 efax > ale...@gmail.com > > >
RE: question on SPARK_WORKER_CORES
Have you tried passing --executor-cores or –total-executor-cores as arguments, , depending on the spark version? From: kant kodali [mailto:kanth...@gmail.com] Sent: Friday, February 17, 2017 5:03 PM To: Alex Kozlov <ale...@gmail.com> Cc: user @spark <user@spark.apache.org> Subject: Re: question on SPARK_WORKER_CORES Standalone. On Fri, Feb 17, 2017 at 5:01 PM, Alex Kozlov <ale...@gmail.com<mailto:ale...@gmail.com>> wrote: What Spark mode are you running the program in? On Fri, Feb 17, 2017 at 4:55 PM, kant kodali <kanth...@gmail.com<mailto:kanth...@gmail.com>> wrote: when I submit a job using spark shell I get something like this [Stage 0:>(36814 + 4) / 220129] Now all I want is I want to increase number of parallel tasks running from 4 to 16 so I exported an env variable called SPARK_WORKER_CORES=16 in conf/spark-env.sh. I though that should do it but it doesn't. It still shows me 4. any idea? Thanks much! -- Alex Kozlov (408) 507-4987<tel:(408)%20507-4987> (650) 887-2135<tel:(650)%20887-2135> efax ale...@gmail.com<mailto:ale...@gmail.com>
Re: question on SPARK_WORKER_CORES
Standalone. On Fri, Feb 17, 2017 at 5:01 PM, Alex Kozlovwrote: > What Spark mode are you running the program in? > > On Fri, Feb 17, 2017 at 4:55 PM, kant kodali wrote: > >> when I submit a job using spark shell I get something like this >> >> [Stage 0:>(36814 + 4) / 220129] >> >> >> Now all I want is I want to increase number of parallel tasks running >> from 4 to 16 so I exported an env variable called SPARK_WORKER_CORES=16 in >> conf/spark-env.sh. I though that should do it but it doesn't. It still >> shows me 4. any idea? >> >> >> Thanks much! >> >> >> > > > -- > Alex Kozlov > (408) 507-4987 > (650) 887-2135 efax > ale...@gmail.com >
Re: question on SPARK_WORKER_CORES
What Spark mode are you running the program in? On Fri, Feb 17, 2017 at 4:55 PM, kant kodaliwrote: > when I submit a job using spark shell I get something like this > > [Stage 0:>(36814 + 4) / 220129] > > > Now all I want is I want to increase number of parallel tasks running from > 4 to 16 so I exported an env variable called SPARK_WORKER_CORES=16 in > conf/spark-env.sh. I though that should do it but it doesn't. It still > shows me 4. any idea? > > > Thanks much! > > > -- Alex Kozlov (408) 507-4987 (650) 887-2135 efax ale...@gmail.com
RE: Question about SPARK_WORKER_CORES and spark.task.cpus
It’s actually not that tricky. SPARK_WORKER_CORES: is the max task thread pool size of the of the executor, the same saying of “one executor with 32 cores and the executor could execute 32 tasks simultaneously”. Spark doesn’t care about how much real physical CPU/Cores you have (OS does), so user need to give an appropriate value to reflect the real physical machine settings, otherwise the thread context switching probably be an overhead for the CPU intensive tasks. “spark.task.cpus”: I copied how to it’s used from the Spark source code: // TODO: The default value of 1 for spark.executor.cores works right now because dynamic // allocation is only supported for YARN and the default number of cores per executor in YARN is // 1, but it might need to be attained differently for different cluster managers private val tasksPerExecutor = conf.getInt(spark.executor.cores, 1) / conf.getInt(spark.task.cpus, 1) It means the “Number of Tasks per Executor”(parallelize task number per executor) = SPARK_WORKER_CORES / “spark.task.cpus” “spark.task.cpus” gives user an opportunity to reserve resources for a task which probably create more running threads internally. (For example, run a multithreaded external app within each task). Hope it helpful. From: Rui Li [mailto:spark.ru...@gmail.com] Sent: Tuesday, June 23, 2015 8:56 AM To: user@spark.apache.org Subject: Question about SPARK_WORKER_CORES and spark.task.cpus Hi, I was running a WordCount application on Spark, and the machine I used has 4 physical cores. However, in spark-env.sh file, I set SPARK_WORKER_CORES = 32. The web UI says it launched one executor with 32 cores and the executor could execute 32 tasks simultaneously. Does spark create 32 vCores out of 4 physical cores? How much physical CPU resource can each task get then? Also, I found a parameter “spark.task.cpus”, but I don’t quite understand this parameter. If I set it to 2, does Spark allocate 2 CPU cores for one task? I think “task” is a “thread” within executor (“process”), so how can a thread utilize two CPU cores simultaneously? I am looking forward to your reply, thanks! Best, Rui