Re: Running Spark in local mode

2016-06-19 Thread Ashok Kumar
Thank you all sirs
Appreciated Mich your clarification.

 

On Sunday, 19 June 2016, 19:31, Mich Talebzadeh  
wrote:
 

 Thanks Jonathan for your points
I am aware of the fact yarn-client and yarn-cluster are both depreciated (still 
work in 1.6.1), hence the new nomenclature.
Bear in mind this is what I stated in my notes:
"YARN Cluster Mode, the Spark driver runs inside an application master process 
which is managed by YARN on the cluster, and the client can go away after 
initiating the application. This is invoked with –master yarn and --deploy-mode 
cluster   
   - YARN Client Mode, the driver runs in the client process, and the 
application master is only used for requesting resources from YARN. 
   -

   - Unlike Spark standalone mode, in which the master’s address is specified 
in the --master parameter, in YARN mode the ResourceManager’s address is picked 
up from the Hadoop configuration. Thus, the --master parameter is yarn. This is 
invoked with --deploy-mode client"

These are exactly from Spark document and I quote
"There are two deploy modes that can be used to launch Spark applications on 
YARN. In cluster mode, the Spark driver runs inside an application master 
process which is managed by YARN on the cluster, and the client can go away 
after initiating the application. 
In client mode, the driver runs in the client process, and the application 
master is only used for requesting resources from YARN.
Unlike Spark standalone and Mesos modes, in which the master’s address is 
specified in the --master parameter, in YARN mode the ResourceManager’s address 
is picked up from the Hadoop configuration. Thus, the --master parameter is 
yarn."
Cheers
Dr Mich Talebzadeh LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
 http://talebzadehmich.wordpress.com 
On 19 June 2016 at 19:09, Jonathan Kelly  wrote:

Mich, what Jacek is saying is not that you implied that YARN relies on two 
masters. He's just clarifying that yarn-client and yarn-cluster modes are 
really both using the same (type of) master (simply "yarn"). In fact, if you 
specify "--master yarn-client" or "--master yarn-cluster", spark-submit will 
translate that into using a master URL of "yarn" and a deploy-mode of "client" 
or "cluster".

And thanks, Jacek, for the tips on the "less-common master URLs". I had no idea 
that was an option!

~ Jonathan
On Sun, Jun 19, 2016 at 4:13 AM Mich Talebzadeh  
wrote:

Good points but I am an experimentalist 
In Local mode I have this
In local mode with:--master local This will start with one thread or equivalent 
to –master local[1]. Youcan also start by more than one thread by specifying 
the number of threads k in –master local[k]. You can also start using all 
available threads with –master local[*]which in mine would be local[12].
The important thing about Local mode is that number of JVM thrown is controlled 
by you and you can start as many spark-submit as you wish within constraint of 
what you get
${SPARK_HOME}/bin/spark-submit\    
--packagescom.databricks:spark-csv_2.11:1.3.0 \    --driver-memory 
2G \    --num-executors 1 \    --executor-memory 2G \   
 --master local \    --executor-cores 2 \   
 --conf"spark.scheduler.mode=FIFO" \    
--conf"spark.executor.extraJavaOptions=-XX:+PrintGCDetails-XX:+PrintGCTimeStamps"
 \    
--jars/home/hduser/jars/spark-streaming-kafka-assembly_2.10-1.6.1.jar \ 
   --class"${FILE_NAME}" \    --conf "spark.ui.port=4040” \ 
   ${JAR_FILE} \    >> ${LOG_FILE}
Now that does work fine although some of those parameters are implicit (for 
example cheduler.mode = FIFOR or FAIR and I can start different spark jobs in 
Local mode. Great for testing.
With regard to your comments on Standalone 
Spark Standalone – a simple cluster manager included with Spark that makes it 
easy to set up a cluster.

s/simple/built-inWhat is stated as "included" implies that, i.e. it comes as 
part of running Spark in standalone mode. 
Your other points on YARN cluster mode and YARN client mode
I'd say there's only one YARN master, i.e. --master yarn. You could
 however say where the driver runs, be it on your local machine where
 you executed spark-submit or on one node in a YARN cluster.
Yes that is I believe what the text implied. I would be very surprised if YARN 
as a resource manager relies on two masters :)

HTH







Dr Mich Talebzadeh LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
 http://talebzadehmich.wordpress.com 
On 19 June 2016 at 11:46, Jacek Laskowski  wrote:

On Sun, Jun 19, 2016 at 12:30 PM, Mich Talebzadeh
 wrote:

> Spark Local - Spark runs on the local host. This is the simplest set up and
> 

Re: Running Spark in local mode

2016-06-19 Thread Mich Talebzadeh
Thanks Jonathan for your points

I am aware of the fact yarn-client and yarn-cluster are both depreciated
(still work in 1.6.1), hence the new nomenclature.

Bear in mind this is what I stated in my notes:

"YARN Cluster Mode, the Spark driver runs inside an application master
process which is managed by YARN on the cluster, and the client can go away
after initiating the application. This is invoked with –master yarn
and --deploy-mode
cluster
-

YARN Client Mode, the driver runs in the client process, and the
application master is only used for requesting resources from YARN.
-


-

Unlike Spark standalone mode, in which the master’s address is specified in
the --master parameter, in YARN mode the ResourceManager’s address is
picked up from the Hadoop configuration. Thus, the --master parameter is
yarn. This is invoked with --deploy-mode client"

These are exactly from Spark document
and I quote

"There are two deploy modes that can be used to launch Spark applications
on YARN. In cluster mode, the Spark driver runs inside an application
master process which is managed by YARN on the cluster, and the client can
go away after initiating the application.

In client mode, the driver runs in the client process, and the application
master is only used for requesting resources from YARN.

Unlike Spark standalone
 and Mesos
 modes, in which
the master’s address is specified in the --master parameter, in YARN mode
the ResourceManager’s address is picked up from the Hadoop configuration.
Thus, the --master parameter is yarn."

Cheers

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com



On 19 June 2016 at 19:09, Jonathan Kelly  wrote:

> Mich, what Jacek is saying is not that you implied that YARN relies on two
> masters. He's just clarifying that yarn-client and yarn-cluster modes are
> really both using the same (type of) master (simply "yarn"). In fact, if
> you specify "--master yarn-client" or "--master yarn-cluster", spark-submit
> will translate that into using a master URL of "yarn" and a deploy-mode of
> "client" or "cluster".
>
> And thanks, Jacek, for the tips on the "less-common master URLs". I had no
> idea that was an option!
>
> ~ Jonathan
>
> On Sun, Jun 19, 2016 at 4:13 AM Mich Talebzadeh 
> wrote:
>
>> Good points but I am an experimentalist
>>
>> In Local mode I have this
>>
>> In local mode with:
>>
>> --master local
>>
>>
>>
>> This will start with one thread or equivalent to –master local[1]. You
>> can also start by more than one thread by specifying the number of threads
>> *k* in –master local[k]. You can also start using all available threads
>> with –master local[*]which in mine would be local[12].
>>
>> The important thing about Local mode is that number of JVM thrown is
>> controlled by you and you can start as many spark-submit as you wish within
>> constraint of what you get
>>
>> ${SPARK_HOME}/bin/spark-submit \
>>
>> --packages com.databricks:spark-csv_2.11:1.3.0 \
>>
>> --driver-memory 2G \
>>
>> --num-executors 1 \
>>
>> --executor-memory 2G \
>>
>> --master local \
>>
>> --executor-cores 2 \
>>
>> --conf "spark.scheduler.mode=FIFO" \
>>
>> --conf
>> "spark.executor.extraJavaOptions=-XX:+PrintGCDetails
>> -XX:+PrintGCTimeStamps" \
>>
>> --jars
>> /home/hduser/jars/spark-streaming-kafka-assembly_2.10-1.6.1.jar \
>>
>> --class "${FILE_NAME}" \
>>
>> --conf "spark.ui.port=4040” \
>>
>> ${JAR_FILE} \
>>
>> >> ${LOG_FILE}
>>
>> Now that does work fine although some of those parameters are implicit
>> (for example cheduler.mode = FIFOR or FAIR and I can start different spark
>> jobs in Local mode. Great for testing.
>>
>> With regard to your comments on Standalone
>>
>> Spark Standalone – a simple cluster manager included with Spark that
>> makes it easy to set up a cluster.
>>
>> s/simple/built-in
>> What is stated as "included" implies that, i.e. it comes as part of
>> running Spark in standalone mode.
>>
>> Your other points on YARN cluster mode and YARN client mode
>>
>> I'd say there's only one YARN master, i.e. --master yarn. You could
>> however say where the driver runs, be it on your local machine where
>> you executed spark-submit or on one node in a YARN cluster.
>>
>>
>> Yes that is I believe what the text implied. I would be very surprised if
>> YARN as a resource manager relies on two masters :)
>>
>>
>> HTH
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Dr Mich Talebzadeh
>>

Re: Running Spark in local mode

2016-06-19 Thread Jonathan Kelly
Mich, what Jacek is saying is not that you implied that YARN relies on two
masters. He's just clarifying that yarn-client and yarn-cluster modes are
really both using the same (type of) master (simply "yarn"). In fact, if
you specify "--master yarn-client" or "--master yarn-cluster", spark-submit
will translate that into using a master URL of "yarn" and a deploy-mode of
"client" or "cluster".

And thanks, Jacek, for the tips on the "less-common master URLs". I had no
idea that was an option!

~ Jonathan

On Sun, Jun 19, 2016 at 4:13 AM Mich Talebzadeh 
wrote:

> Good points but I am an experimentalist
>
> In Local mode I have this
>
> In local mode with:
>
> --master local
>
>
>
> This will start with one thread or equivalent to –master local[1]. You can
> also start by more than one thread by specifying the number of threads *k*
> in –master local[k]. You can also start using all available threads with 
> –master
> local[*]which in mine would be local[12].
>
> The important thing about Local mode is that number of JVM thrown is
> controlled by you and you can start as many spark-submit as you wish within
> constraint of what you get
>
> ${SPARK_HOME}/bin/spark-submit \
>
> --packages com.databricks:spark-csv_2.11:1.3.0 \
>
> --driver-memory 2G \
>
> --num-executors 1 \
>
> --executor-memory 2G \
>
> --master local \
>
> --executor-cores 2 \
>
> --conf "spark.scheduler.mode=FIFO" \
>
> --conf
> "spark.executor.extraJavaOptions=-XX:+PrintGCDetails
> -XX:+PrintGCTimeStamps" \
>
> --jars
> /home/hduser/jars/spark-streaming-kafka-assembly_2.10-1.6.1.jar \
>
> --class "${FILE_NAME}" \
>
> --conf "spark.ui.port=4040” \
>
> ${JAR_FILE} \
>
> >> ${LOG_FILE}
>
> Now that does work fine although some of those parameters are implicit
> (for example cheduler.mode = FIFOR or FAIR and I can start different spark
> jobs in Local mode. Great for testing.
>
> With regard to your comments on Standalone
>
> Spark Standalone – a simple cluster manager included with Spark that
> makes it easy to set up a cluster.
>
> s/simple/built-in
> What is stated as "included" implies that, i.e. it comes as part of
> running Spark in standalone mode.
>
> Your other points on YARN cluster mode and YARN client mode
>
> I'd say there's only one YARN master, i.e. --master yarn. You could
> however say where the driver runs, be it on your local machine where
> you executed spark-submit or on one node in a YARN cluster.
>
>
> Yes that is I believe what the text implied. I would be very surprised if
> YARN as a resource manager relies on two masters :)
>
>
> HTH
>
>
>
>
>
>
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 19 June 2016 at 11:46, Jacek Laskowski  wrote:
>
>> On Sun, Jun 19, 2016 at 12:30 PM, Mich Talebzadeh
>>  wrote:
>>
>> > Spark Local - Spark runs on the local host. This is the simplest set up
>> and
>> > best suited for learners who want to understand different concepts of
>> Spark
>> > and those performing unit testing.
>>
>> There are also the less-common master URLs:
>>
>> * local[n, maxRetries] or local[*, maxRetries] — local mode with n
>> threads and maxRetries number of failures.
>> * local-cluster[n, cores, memory] for simulating a Spark local cluster
>> with n workers, # cores per worker, and # memory per worker.
>>
>> As of Spark 2.0.0, you could also have your own scheduling system -
>> see https://issues.apache.org/jira/browse/SPARK-13904 - with the only
>> known implementation of the ExternalClusterManager contract in Spark
>> being YarnClusterManager, i.e. whenever you call Spark with --master
>> yarn.
>>
>> > Spark Standalone – a simple cluster manager included with Spark that
>> makes
>> > it easy to set up a cluster.
>>
>> s/simple/built-in
>>
>> > YARN Cluster Mode, the Spark driver runs inside an application master
>> > process which is managed by YARN on the cluster, and the client can go
>> away
>> > after initiating the application. This is invoked with –master yarn and
>> > --deploy-mode cluster
>> >
>> > YARN Client Mode, the driver runs in the client process, and the
>> application
>> > master is only used for requesting resources from YARN. Unlike Spark
>> > standalone mode, in which the master’s address is specified in the
>> --master
>> > parameter, in YARN mode the ResourceManager’s address is picked up from
>> the
>> > Hadoop configuration. Thus, the --master parameter is yarn. This is
>> invoked
>> > with --deploy-mode client
>>
>> I'd say there's only one YARN master, i.e. --master yarn. You could
>> 

Re: Running Spark in local mode

2016-06-19 Thread Mich Talebzadeh
Good points but I am an experimentalist

In Local mode I have this

In local mode with:

--master local



This will start with one thread or equivalent to –master local[1]. You can
also start by more than one thread by specifying the number of threads *k*
in –master local[k]. You can also start using all available threads
with –master
local[*]which in mine would be local[12].

The important thing about Local mode is that number of JVM thrown is
controlled by you and you can start as many spark-submit as you wish within
constraint of what you get

${SPARK_HOME}/bin/spark-submit \

--packages com.databricks:spark-csv_2.11:1.3.0 \

--driver-memory 2G \

--num-executors 1 \

--executor-memory 2G \

--master local \

--executor-cores 2 \

--conf "spark.scheduler.mode=FIFO" \

--conf "spark.executor.extraJavaOptions=-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps" \

--jars
/home/hduser/jars/spark-streaming-kafka-assembly_2.10-1.6.1.jar \

--class "${FILE_NAME}" \

--conf "spark.ui.port=4040” \

${JAR_FILE} \

>> ${LOG_FILE}

Now that does work fine although some of those parameters are implicit (for
example cheduler.mode = FIFOR or FAIR and I can start different spark jobs
in Local mode. Great for testing.

With regard to your comments on Standalone

Spark Standalone – a simple cluster manager included with Spark that
makes it easy to set up a cluster.

s/simple/built-in
What is stated as "included" implies that, i.e. it comes as part of running
Spark in standalone mode.

Your other points on YARN cluster mode and YARN client mode

I'd say there's only one YARN master, i.e. --master yarn. You could
however say where the driver runs, be it on your local machine where
you executed spark-submit or on one node in a YARN cluster.


Yes that is I believe what the text implied. I would be very surprised if
YARN as a resource manager relies on two masters :)


HTH









Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com



On 19 June 2016 at 11:46, Jacek Laskowski  wrote:

> On Sun, Jun 19, 2016 at 12:30 PM, Mich Talebzadeh
>  wrote:
>
> > Spark Local - Spark runs on the local host. This is the simplest set up
> and
> > best suited for learners who want to understand different concepts of
> Spark
> > and those performing unit testing.
>
> There are also the less-common master URLs:
>
> * local[n, maxRetries] or local[*, maxRetries] — local mode with n
> threads and maxRetries number of failures.
> * local-cluster[n, cores, memory] for simulating a Spark local cluster
> with n workers, # cores per worker, and # memory per worker.
>
> As of Spark 2.0.0, you could also have your own scheduling system -
> see https://issues.apache.org/jira/browse/SPARK-13904 - with the only
> known implementation of the ExternalClusterManager contract in Spark
> being YarnClusterManager, i.e. whenever you call Spark with --master
> yarn.
>
> > Spark Standalone – a simple cluster manager included with Spark that
> makes
> > it easy to set up a cluster.
>
> s/simple/built-in
>
> > YARN Cluster Mode, the Spark driver runs inside an application master
> > process which is managed by YARN on the cluster, and the client can go
> away
> > after initiating the application. This is invoked with –master yarn and
> > --deploy-mode cluster
> >
> > YARN Client Mode, the driver runs in the client process, and the
> application
> > master is only used for requesting resources from YARN. Unlike Spark
> > standalone mode, in which the master’s address is specified in the
> --master
> > parameter, in YARN mode the ResourceManager’s address is picked up from
> the
> > Hadoop configuration. Thus, the --master parameter is yarn. This is
> invoked
> > with --deploy-mode client
>
> I'd say there's only one YARN master, i.e. --master yarn. You could
> however say where the driver runs, be it on your local machine where
> you executed spark-submit or on one node in a YARN cluster.
>
> The same applies to Spark Standalone and Mesos and is controlled by
> --deploy-mode, i.e. client (default) or cluster.
>
> Please update your notes accordingly ;-)
>
> Pozdrawiam,
> Jacek Laskowski
> 
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>


Re: Running Spark in local mode

2016-06-19 Thread Jacek Laskowski
On Sun, Jun 19, 2016 at 12:30 PM, Mich Talebzadeh
 wrote:

> Spark Local - Spark runs on the local host. This is the simplest set up and
> best suited for learners who want to understand different concepts of Spark
> and those performing unit testing.

There are also the less-common master URLs:

* local[n, maxRetries] or local[*, maxRetries] — local mode with n
threads and maxRetries number of failures.
* local-cluster[n, cores, memory] for simulating a Spark local cluster
with n workers, # cores per worker, and # memory per worker.

As of Spark 2.0.0, you could also have your own scheduling system -
see https://issues.apache.org/jira/browse/SPARK-13904 - with the only
known implementation of the ExternalClusterManager contract in Spark
being YarnClusterManager, i.e. whenever you call Spark with --master
yarn.

> Spark Standalone – a simple cluster manager included with Spark that makes
> it easy to set up a cluster.

s/simple/built-in

> YARN Cluster Mode, the Spark driver runs inside an application master
> process which is managed by YARN on the cluster, and the client can go away
> after initiating the application. This is invoked with –master yarn and
> --deploy-mode cluster
>
> YARN Client Mode, the driver runs in the client process, and the application
> master is only used for requesting resources from YARN. Unlike Spark
> standalone mode, in which the master’s address is specified in the --master
> parameter, in YARN mode the ResourceManager’s address is picked up from the
> Hadoop configuration. Thus, the --master parameter is yarn. This is invoked
> with --deploy-mode client

I'd say there's only one YARN master, i.e. --master yarn. You could
however say where the driver runs, be it on your local machine where
you executed spark-submit or on one node in a YARN cluster.

The same applies to Spark Standalone and Mesos and is controlled by
--deploy-mode, i.e. client (default) or cluster.

Please update your notes accordingly ;-)

Pozdrawiam,
Jacek Laskowski

https://medium.com/@jaceklaskowski/
Mastering Apache Spark http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Running Spark in local mode

2016-06-19 Thread Mich Talebzadeh
Spark works on different modes, either local (Spark or anything else does
not manager) resources and  standalone (Spark itself manages resources)
plus others (see below)

These are from my notes, excluding mesos that I have not used


   - Spark Local - Spark runs on the local host. This is the simplest set
   up and best suited for learners who want to understand different concepts
   of Spark and those performing unit testing.
   -

   Spark Standalone – a simple cluster manager included with Spark that
   makes it easy to set up a cluster.
   -

   YARN Cluster Mode, the Spark driver runs inside an application master
   process which is managed by YARN on the cluster, and the client can go away
   after initiating the application. This is invoked with –master yarn
and --deploy-mode
   cluster
   -

   YARN Client Mode, the driver runs in the client process, and the
   application master is only used for requesting resources from YARN.
Unlike Spark
   standalone mode, in which the master’s address is specified in the
   --master parameter, in YARN mode the ResourceManager’s address is picked
   up from the Hadoop configuration. Thus, the --master parameter is yarn. This
   is invoked with --deploy-mode client

 So in Local mode  is the simplest configuration of Spark that does not
require a Cluster. The user on the local host can launch and experiment
with Spark. In this mode the driver program (SparkSubmit), the resource
manager and executor all exist within the same JVM. The JVM itself is the
worker thread. In Local mode, you do not need to start master and
slaves/workers. In this mode it is pretty simple and you can run as many
JVMs (spark-submit) as your resources allow (resource meaning memory and
cores).

HTH



Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com



On 19 June 2016 at 10:39, Takeshi Yamamuro  wrote:

> There are many technical differences inside though, how to use is the
> almost same with each other.
> yea, in a standalone mode, spark runs in a cluster way: see
> http://spark.apache.org/docs/1.6.1/cluster-overview.html
>
> // maropu
>
> On Sun, Jun 19, 2016 at 6:14 PM, Ashok Kumar  wrote:
>
>> thank you
>>
>> What are the main differences between a local mode and standalone mode. I
>> understand local mode does not support cluster. Is that the only difference?
>>
>>
>>
>> On Sunday, 19 June 2016, 9:52, Takeshi Yamamuro 
>> wrote:
>>
>>
>> Hi,
>>
>> In a local mode, spark runs in a single JVM that has a master and one
>> executor with `k` threads.
>>
>> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/local/LocalSchedulerBackend.scala#L94
>>
>> // maropu
>>
>>
>> On Sun, Jun 19, 2016 at 5:39 PM, Ashok Kumar <
>> ashok34...@yahoo.com.invalid> wrote:
>>
>> Hi,
>>
>> I have been told Spark in Local mode is simplest for testing. Spark
>> document covers little on local mode except the cores used in --master
>> local[k].
>>
>> Where are the the driver program, executor and resources. Do I need to
>> start worker threads and how many app I can use safely without exceeding
>> memory allocated etc?
>>
>> Thanking you
>>
>>
>>
>>
>>
>> --
>> ---
>> Takeshi Yamamuro
>>
>>
>>
>
>
> --
> ---
> Takeshi Yamamuro
>


Re: Running Spark in local mode

2016-06-19 Thread Takeshi Yamamuro
There are many technical differences inside though, how to use is the
almost same with each other.
yea, in a standalone mode, spark runs in a cluster way: see
http://spark.apache.org/docs/1.6.1/cluster-overview.html

// maropu

On Sun, Jun 19, 2016 at 6:14 PM, Ashok Kumar  wrote:

> thank you
>
> What are the main differences between a local mode and standalone mode. I
> understand local mode does not support cluster. Is that the only difference?
>
>
>
> On Sunday, 19 June 2016, 9:52, Takeshi Yamamuro 
> wrote:
>
>
> Hi,
>
> In a local mode, spark runs in a single JVM that has a master and one
> executor with `k` threads.
>
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/local/LocalSchedulerBackend.scala#L94
>
> // maropu
>
>
> On Sun, Jun 19, 2016 at 5:39 PM, Ashok Kumar  > wrote:
>
> Hi,
>
> I have been told Spark in Local mode is simplest for testing. Spark
> document covers little on local mode except the cores used in --master
> local[k].
>
> Where are the the driver program, executor and resources. Do I need to
> start worker threads and how many app I can use safely without exceeding
> memory allocated etc?
>
> Thanking you
>
>
>
>
>
> --
> ---
> Takeshi Yamamuro
>
>
>


-- 
---
Takeshi Yamamuro


Re: Running Spark in local mode

2016-06-19 Thread Ashok Kumar
thank you 
What are the main differences between a local mode and standalone mode. I 
understand local mode does not support cluster. Is that the only difference?
 

On Sunday, 19 June 2016, 9:52, Takeshi Yamamuro  
wrote:
 

 Hi,
In a local mode, spark runs in a single JVM that has a master and one executor 
with `k` 
threads.https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/local/LocalSchedulerBackend.scala#L94

// maropu

On Sun, Jun 19, 2016 at 5:39 PM, Ashok Kumar  
wrote:

Hi,
I have been told Spark in Local mode is simplest for testing. Spark document 
covers little on local mode except the cores used in --master local[k]. 
Where are the the driver program, executor and resources. Do I need to start 
worker threads and how many app I can use safely without exceeding memory 
allocated etc?
Thanking you





-- 
---
Takeshi Yamamuro


  

Re: Running Spark in local mode

2016-06-19 Thread Takeshi Yamamuro
Hi,

In a local mode, spark runs in a single JVM that has a master and one
executor with `k` threads.
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/local/LocalSchedulerBackend.scala#L94

// maropu


On Sun, Jun 19, 2016 at 5:39 PM, Ashok Kumar 
wrote:

> Hi,
>
> I have been told Spark in Local mode is simplest for testing. Spark
> document covers little on local mode except the cores used in --master
> local[k].
>
> Where are the the driver program, executor and resources. Do I need to
> start worker threads and how many app I can use safely without exceeding
> memory allocated etc?
>
> Thanking you
>
>
>


-- 
---
Takeshi Yamamuro


Running Spark in local mode

2016-06-19 Thread Ashok Kumar
Hi,
I have been told Spark in Local mode is simplest for testing. Spark document 
covers little on local mode except the cores used in --master local[k]. 
Where are the the driver program, executor and resources. Do I need to start 
worker threads and how many app I can use safely without exceeding memory 
allocated etc?
Thanking you



Re: Running Spark in Local Mode

2015-06-11 Thread mrm
Hi, 

Did you resolve this? I have the same questions.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Running-Spark-in-Local-Mode-tp22279p23278.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Running Spark in local mode seems to ignore local[N]

2015-05-11 Thread Dmitry Goldenberg
Sean,

How does this model actually work? Let's say we want to run one job as N
threads executing one particular task, e.g. streaming data out of Kafka
into a search engine.  How do we configure our Spark job execution?

Right now, I'm seeing this job running as a single thread. And it's quite a
bit slower than just running a simple utility with a thread executor with a
thread pool of N threads doing the same task.

The performance I'm seeing of running the Kafka-Spark Streaming job is 7
times slower than that of the utility.  What's pulling Spark back?

Thanks.


On Mon, May 11, 2015 at 4:55 PM, Sean Owen so...@cloudera.com wrote:

 You have one worker with one executor with 32 execution slots.

 On Mon, May 11, 2015 at 9:52 PM, dgoldenberg dgoldenberg...@gmail.com
 wrote:
  Hi,
 
  Is there anything special one must do, running locally and submitting a
 job
  like so:
 
  spark-submit \
  --class com.myco.Driver \
  --master local[*]  \
  ./lib/myco.jar
 
  In my logs, I'm only seeing log messages with the thread identifier of
  Executor task launch worker-0.
 
  There are 4 cores on the machine so I expected 4 threads to be at play.
  Running with local[32] did not yield 32 worker threads.
 
  Any recommendations? Thanks.
 
 
 
  --
  View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Running-Spark-in-local-mode-seems-to-ignore-local-N-tp22851.html
  Sent from the Apache Spark User List mailing list archive at Nabble.com.
 
  -
  To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
  For additional commands, e-mail: user-h...@spark.apache.org
 



Re: Running Spark in local mode seems to ignore local[N]

2015-05-11 Thread Sean Owen
BTW I think my comment was wrong as marcelo demonstrated. In
standalone mode you'd have one worker, and you do have one executor,
but his explanation is right. But, you certainly have execution slots
for each core.

Are you talking about your own user code? you can make threads, but
that's nothing do with Spark then. If you run code on your driver,
it's not distributed. If you run Spark over an RDD with 1 partition,
only one task works on it.

On Mon, May 11, 2015 at 10:16 PM, Dmitry Goldenberg
dgoldenberg...@gmail.com wrote:
 Sean,

 How does this model actually work? Let's say we want to run one job as N
 threads executing one particular task, e.g. streaming data out of Kafka into
 a search engine.  How do we configure our Spark job execution?

 Right now, I'm seeing this job running as a single thread. And it's quite a
 bit slower than just running a simple utility with a thread executor with a
 thread pool of N threads doing the same task.

 The performance I'm seeing of running the Kafka-Spark Streaming job is 7
 times slower than that of the utility.  What's pulling Spark back?

 Thanks.


 On Mon, May 11, 2015 at 4:55 PM, Sean Owen so...@cloudera.com wrote:

 You have one worker with one executor with 32 execution slots.

 On Mon, May 11, 2015 at 9:52 PM, dgoldenberg dgoldenberg...@gmail.com
 wrote:
  Hi,
 
  Is there anything special one must do, running locally and submitting a
  job
  like so:
 
  spark-submit \
  --class com.myco.Driver \
  --master local[*]  \
  ./lib/myco.jar
 
  In my logs, I'm only seeing log messages with the thread identifier of
  Executor task launch worker-0.
 
  There are 4 cores on the machine so I expected 4 threads to be at play.
  Running with local[32] did not yield 32 worker threads.
 
  Any recommendations? Thanks.
 
 
 
  --
  View this message in context:
  http://apache-spark-user-list.1001560.n3.nabble.com/Running-Spark-in-local-mode-seems-to-ignore-local-N-tp22851.html
  Sent from the Apache Spark User List mailing list archive at Nabble.com.
 
  -
  To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
  For additional commands, e-mail: user-h...@spark.apache.org
 



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Running Spark in local mode seems to ignore local[N]

2015-05-11 Thread Dmitry Goldenberg
Thanks, Sean. This was not yet digested data for me :)

The number of partitions in a streaming RDD is determined by the
block interval and the batch interval.  I have seen the bit on
spark.streaming.blockInterval
in the doc but I didn't connect it with the batch interval and the number
of partitions.

On Mon, May 11, 2015 at 5:34 PM, Sean Owen so...@cloudera.com wrote:

 You might have a look at the Spark docs to start. 1 batch = 1 RDD, but
 1 RDD can have many partitions. And should, for scale. You do not
 submit multiple jobs to get parallelism.

 The number of partitions in a streaming RDD is determined by the block
 interval and the batch interval. If you have a batch interval of 10s
 and block interval of 1s you'll get 10 partitions of data in the RDD.

 On Mon, May 11, 2015 at 10:29 PM, Dmitry Goldenberg
 dgoldenberg...@gmail.com wrote:
  Understood. We'll use the multi-threaded code we already have..
 
  How are these execution slots filled up? I assume each slot is dedicated
 to
  one submitted task.  If that's the case, how is each task distributed
 then,
  i.e. how is that task run in a multi-node fashion?  Say 1000
 batches/RDD's
  are extracted out of Kafka, how does that relate to the number of
 executors
  vs. task slots?
 
  Presumably we can fill up the slots with multiple instances of the same
  task... How do we know how many to launch?
 
  On Mon, May 11, 2015 at 5:20 PM, Sean Owen so...@cloudera.com wrote:
 
  BTW I think my comment was wrong as marcelo demonstrated. In
  standalone mode you'd have one worker, and you do have one executor,
  but his explanation is right. But, you certainly have execution slots
  for each core.
 
  Are you talking about your own user code? you can make threads, but
  that's nothing do with Spark then. If you run code on your driver,
  it's not distributed. If you run Spark over an RDD with 1 partition,
  only one task works on it.
 
  On Mon, May 11, 2015 at 10:16 PM, Dmitry Goldenberg
  dgoldenberg...@gmail.com wrote:
   Sean,
  
   How does this model actually work? Let's say we want to run one job
 as N
   threads executing one particular task, e.g. streaming data out of
 Kafka
   into
   a search engine.  How do we configure our Spark job execution?
  
   Right now, I'm seeing this job running as a single thread. And it's
   quite a
   bit slower than just running a simple utility with a thread executor
   with a
   thread pool of N threads doing the same task.
  
   The performance I'm seeing of running the Kafka-Spark Streaming job
 is 7
   times slower than that of the utility.  What's pulling Spark back?
  
   Thanks.
  
  
   On Mon, May 11, 2015 at 4:55 PM, Sean Owen so...@cloudera.com
 wrote:
  
   You have one worker with one executor with 32 execution slots.
  
   On Mon, May 11, 2015 at 9:52 PM, dgoldenberg 
 dgoldenberg...@gmail.com
   wrote:
Hi,
   
Is there anything special one must do, running locally and
 submitting
a
job
like so:
   
spark-submit \
--class com.myco.Driver \
--master local[*]  \
./lib/myco.jar
   
In my logs, I'm only seeing log messages with the thread identifier
of
Executor task launch worker-0.
   
There are 4 cores on the machine so I expected 4 threads to be at
play.
Running with local[32] did not yield 32 worker threads.
   
Any recommendations? Thanks.
   
   
   
--
View this message in context:
   
   
 http://apache-spark-user-list.1001560.n3.nabble.com/Running-Spark-in-local-mode-seems-to-ignore-local-N-tp22851.html
Sent from the Apache Spark User List mailing list archive at
Nabble.com.
   
   
 -
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
   
  
  
 
 



Re: Running Spark in local mode seems to ignore local[N]

2015-05-11 Thread Sean Owen
You have one worker with one executor with 32 execution slots.

On Mon, May 11, 2015 at 9:52 PM, dgoldenberg dgoldenberg...@gmail.com wrote:
 Hi,

 Is there anything special one must do, running locally and submitting a job
 like so:

 spark-submit \
 --class com.myco.Driver \
 --master local[*]  \
 ./lib/myco.jar

 In my logs, I'm only seeing log messages with the thread identifier of
 Executor task launch worker-0.

 There are 4 cores on the machine so I expected 4 threads to be at play.
 Running with local[32] did not yield 32 worker threads.

 Any recommendations? Thanks.



 --
 View this message in context: 
 http://apache-spark-user-list.1001560.n3.nabble.com/Running-Spark-in-local-mode-seems-to-ignore-local-N-tp22851.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Running Spark in local mode seems to ignore local[N]

2015-05-11 Thread Marcelo Vanzin
Are you actually running anything that requires all those slots? e.g.,
locally, I get this with local[16], but only after I run something that
actually uses those 16 slots:

Executor task launch worker-15 daemon prio=10 tid=0x7f4c80029800
nid=0x8ce waiting on condition [0x7f4c62493000]
Executor task launch worker-14 daemon prio=10 tid=0x7f4c80027800
nid=0x8cd waiting on condition [0x7f4c62594000]
Executor task launch worker-13 daemon prio=10 tid=0x7f4c80025800
nid=0x8cc waiting on condition [0x7f4c62695000]
Executor task launch worker-12 daemon prio=10 tid=0x7f4c80023800
nid=0x8cb waiting on condition [0x7f4c62796000]
Executor task launch worker-11 daemon prio=10 tid=0x7f4c80021800
nid=0x8ca waiting on condition [0x7f4c62897000]
Executor task launch worker-10 daemon prio=10 tid=0x7f4c8001f800
nid=0x8c9 waiting on condition [0x7f4c62998000]
Executor task launch worker-9 daemon prio=10 tid=0x7f4c8001d800
nid=0x8c8 waiting on condition [0x7f4c62a99000]
Executor task launch worker-8 daemon prio=10 tid=0x7f4c8001b800
nid=0x8c7 waiting on condition [0x7f4c62b9a000]
Executor task launch worker-7 daemon prio=10 tid=0x7f4c80019800
nid=0x8c6 waiting on condition [0x7f4c62c9b000]
Executor task launch worker-6 daemon prio=10 tid=0x7f4c80018000
nid=0x8c5 waiting on condition [0x7f4c62d9c000]
Executor task launch worker-5 daemon prio=10 tid=0x7f4c80011000
nid=0x8c4 waiting on condition [0x7f4c62e9d000]
Executor task launch worker-4 daemon prio=10 tid=0x7f4c8000f800
nid=0x8c3 waiting on condition [0x7f4c62f9e000]
Executor task launch worker-3 daemon prio=10 tid=0x7f4c8000e000
nid=0x8c2 waiting on condition [0x7f4c6309f000]
Executor task launch worker-2 daemon prio=10 tid=0x7f4c8000c800
nid=0x8c1 waiting on condition [0x7f4c631a]
Executor task launch worker-1 daemon prio=10 tid=0x7f4c80007800
nid=0x8c0 waiting on condition [0x7f4c632a1000]
Executor task launch worker-0 daemon prio=10 tid=0x7f4c80015800
nid=0x8bf waiting on condition [0x7f4c635f4000]


On Mon, May 11, 2015 at 1:52 PM, dgoldenberg dgoldenberg...@gmail.com
wrote:

 Hi,

 Is there anything special one must do, running locally and submitting a job
 like so:

 spark-submit \
 --class com.myco.Driver \
 --master local[*]  \
 ./lib/myco.jar

 In my logs, I'm only seeing log messages with the thread identifier of
 Executor task launch worker-0.

 There are 4 cores on the machine so I expected 4 threads to be at play.
 Running with local[32] did not yield 32 worker threads.

 Any recommendations? Thanks.



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Running-Spark-in-local-mode-seems-to-ignore-local-N-tp22851.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




-- 
Marcelo


Re: Running Spark in local mode seems to ignore local[N]

2015-05-11 Thread Dmitry Goldenberg
Understood. We'll use the multi-threaded code we already have..

How are these execution slots filled up? I assume each slot is dedicated to
one submitted task.  If that's the case, how is each task distributed then,
i.e. how is that task run in a multi-node fashion?  Say 1000 batches/RDD's
are extracted out of Kafka, how does that relate to the number of executors
vs. task slots?

Presumably we can fill up the slots with multiple instances of the same
task... How do we know how many to launch?

On Mon, May 11, 2015 at 5:20 PM, Sean Owen so...@cloudera.com wrote:

 BTW I think my comment was wrong as marcelo demonstrated. In
 standalone mode you'd have one worker, and you do have one executor,
 but his explanation is right. But, you certainly have execution slots
 for each core.

 Are you talking about your own user code? you can make threads, but
 that's nothing do with Spark then. If you run code on your driver,
 it's not distributed. If you run Spark over an RDD with 1 partition,
 only one task works on it.

 On Mon, May 11, 2015 at 10:16 PM, Dmitry Goldenberg
 dgoldenberg...@gmail.com wrote:
  Sean,
 
  How does this model actually work? Let's say we want to run one job as N
  threads executing one particular task, e.g. streaming data out of Kafka
 into
  a search engine.  How do we configure our Spark job execution?
 
  Right now, I'm seeing this job running as a single thread. And it's
 quite a
  bit slower than just running a simple utility with a thread executor
 with a
  thread pool of N threads doing the same task.
 
  The performance I'm seeing of running the Kafka-Spark Streaming job is 7
  times slower than that of the utility.  What's pulling Spark back?
 
  Thanks.
 
 
  On Mon, May 11, 2015 at 4:55 PM, Sean Owen so...@cloudera.com wrote:
 
  You have one worker with one executor with 32 execution slots.
 
  On Mon, May 11, 2015 at 9:52 PM, dgoldenberg dgoldenberg...@gmail.com
  wrote:
   Hi,
  
   Is there anything special one must do, running locally and submitting
 a
   job
   like so:
  
   spark-submit \
   --class com.myco.Driver \
   --master local[*]  \
   ./lib/myco.jar
  
   In my logs, I'm only seeing log messages with the thread identifier of
   Executor task launch worker-0.
  
   There are 4 cores on the machine so I expected 4 threads to be at
 play.
   Running with local[32] did not yield 32 worker threads.
  
   Any recommendations? Thanks.
  
  
  
   --
   View this message in context:
  
 http://apache-spark-user-list.1001560.n3.nabble.com/Running-Spark-in-local-mode-seems-to-ignore-local-N-tp22851.html
   Sent from the Apache Spark User List mailing list archive at
 Nabble.com.
  
   -
   To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
   For additional commands, e-mail: user-h...@spark.apache.org
  
 
 



Re: Running Spark in local mode seems to ignore local[N]

2015-05-11 Thread Dmitry Goldenberg
Seems to be running OK with 4 threads, 16 threads... While running with 32
threads I started getting the below.

15/05/11 19:48:46 WARN executor.Executor: Issue communicating with driver
in heartbeater
org.apache.spark.SparkException: Error sending message [message =
Heartbeat(driver,[Lscala.Tuple2;@7668b255,BlockManagerId(driver,
localhost, 43318))]
at
org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:209)
at
org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:427)
Caused by: akka.pattern.AskTimeoutException:
Recipient[Actor[akka://sparkDriver/user/HeartbeatReceiver#-677986522]] had
already been terminated.
at akka.pattern.AskableActorRef$.ask$extension(AskSupport.scala:132)
at
org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:194)
... 1 more


On Mon, May 11, 2015 at 5:34 PM, Sean Owen so...@cloudera.com wrote:

 You might have a look at the Spark docs to start. 1 batch = 1 RDD, but
 1 RDD can have many partitions. And should, for scale. You do not
 submit multiple jobs to get parallelism.

 The number of partitions in a streaming RDD is determined by the block
 interval and the batch interval. If you have a batch interval of 10s
 and block interval of 1s you'll get 10 partitions of data in the RDD.

 On Mon, May 11, 2015 at 10:29 PM, Dmitry Goldenberg
 dgoldenberg...@gmail.com wrote:
  Understood. We'll use the multi-threaded code we already have..
 
  How are these execution slots filled up? I assume each slot is dedicated
 to
  one submitted task.  If that's the case, how is each task distributed
 then,
  i.e. how is that task run in a multi-node fashion?  Say 1000
 batches/RDD's
  are extracted out of Kafka, how does that relate to the number of
 executors
  vs. task slots?
 
  Presumably we can fill up the slots with multiple instances of the same
  task... How do we know how many to launch?
 
  On Mon, May 11, 2015 at 5:20 PM, Sean Owen so...@cloudera.com wrote:
 
  BTW I think my comment was wrong as marcelo demonstrated. In
  standalone mode you'd have one worker, and you do have one executor,
  but his explanation is right. But, you certainly have execution slots
  for each core.
 
  Are you talking about your own user code? you can make threads, but
  that's nothing do with Spark then. If you run code on your driver,
  it's not distributed. If you run Spark over an RDD with 1 partition,
  only one task works on it.
 
  On Mon, May 11, 2015 at 10:16 PM, Dmitry Goldenberg
  dgoldenberg...@gmail.com wrote:
   Sean,
  
   How does this model actually work? Let's say we want to run one job
 as N
   threads executing one particular task, e.g. streaming data out of
 Kafka
   into
   a search engine.  How do we configure our Spark job execution?
  
   Right now, I'm seeing this job running as a single thread. And it's
   quite a
   bit slower than just running a simple utility with a thread executor
   with a
   thread pool of N threads doing the same task.
  
   The performance I'm seeing of running the Kafka-Spark Streaming job
 is 7
   times slower than that of the utility.  What's pulling Spark back?
  
   Thanks.
  
  
   On Mon, May 11, 2015 at 4:55 PM, Sean Owen so...@cloudera.com
 wrote:
  
   You have one worker with one executor with 32 execution slots.
  
   On Mon, May 11, 2015 at 9:52 PM, dgoldenberg 
 dgoldenberg...@gmail.com
   wrote:
Hi,
   
Is there anything special one must do, running locally and
 submitting
a
job
like so:
   
spark-submit \
--class com.myco.Driver \
--master local[*]  \
./lib/myco.jar
   
In my logs, I'm only seeing log messages with the thread identifier
of
Executor task launch worker-0.
   
There are 4 cores on the machine so I expected 4 threads to be at
play.
Running with local[32] did not yield 32 worker threads.
   
Any recommendations? Thanks.
   
   
   
--
View this message in context:
   
   
 http://apache-spark-user-list.1001560.n3.nabble.com/Running-Spark-in-local-mode-seems-to-ignore-local-N-tp22851.html
Sent from the Apache Spark User List mailing list archive at
Nabble.com.
   
   
 -
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
   
  
  
 
 



Re: Running Spark in Local Mode

2015-03-29 Thread Saisai Shao
Hi,

I think for local mode, the number N (N number of thread) basically equals
to N number of available cores in ONE executor(worker), not N workers. You
could image local[N] as have one worker with N cores. I'm not sure you
could set the memory usage for each thread, for Spark the memory is shared
in one executor.

Thanks
Jerry


2015-03-30 4:21 GMT+08:00 FreePeter wenlei@gmail.com:

 Hi,

 I am trying to use Spark for my own applications, and I am currently
 profiling the performance with local mode, and I have a couple of
 questions:

 1. When I set spark.master local[N], it means the will use up to N worker
 *threads* on the single machine. Is this equivalent to say there are N
 worker *nodes*  as described in
 http://spark.apache.org/docs/latest/cluster-overview.html
 (So each worker node/thread are viewed separately and can have its own
 executor for each application)

 2. Is there anyway to set up the max memory used by each worker
 thread/node?
 I only find we can set the memory for each executor? (spark.executor.mem)

 Thank you!





 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Running-Spark-in-Local-Mode-tp22279.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




Running Spark in Local Mode

2015-03-29 Thread FreePeter
Hi,

I am trying to use Spark for my own applications, and I am currently
profiling the performance with local mode, and I have a couple of questions:

1. When I set spark.master local[N], it means the will use up to N worker
*threads* on the single machine. Is this equivalent to say there are N
worker *nodes*  as described in
http://spark.apache.org/docs/latest/cluster-overview.html
(So each worker node/thread are viewed separately and can have its own
executor for each application)

2. Is there anyway to set up the max memory used by each worker thread/node?
I only find we can set the memory for each executor? (spark.executor.mem)

Thank you!





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Running-Spark-in-Local-Mode-tp22279.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Running Spark in Local Mode vs. Single Node Cluster

2014-09-22 Thread kriskalish
I'm in a situation where I'm running Spark streaming on a single machine
right now. The plan is to ultimately run it on a cluster, but for the next
couple months it will probably stay on one machine.

I tried to do some digging and I can't find any indication of whether it's
better to run spark as a single node cluster or just in local mode. As far
as I can tell, the only real difference is that it's difficult to configure
the executor memory in local mode.

I have been having problems with spark crashing in local mode so far, which
has lead me to do this research. I'll be migrating from Spark 1.0.2 to 1.1.0
in the next day or so to see if that helps.

Does anyone have any experience on the matter?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Running-Spark-in-Local-Mode-vs-Single-Node-Cluster-tp14834.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org