Re: Running Spark in local mode

Mich Talebzadeh Sun, 19 Jun 2016 04:14:44 -0700

Good points but I am an experimentalist

In Local mode I have this


In local mode with:

--master local



This will start with one thread or equivalent to –master local[1]. You can
also start by more than one thread by specifying the number of threads *k*
in –master local[k]. You can also start using all available threads
with –master
local[*]which in mine would be local[12].

The important thing about Local mode is that number of JVM thrown is
controlled by you and you can start as many spark-submit as you wish within
constraint of what you get

${SPARK_HOME}/bin/spark-submit \

                --packages com.databricks:spark-csv_2.11:1.3.0 \

                --driver-memory 2G \

                --num-executors 1 \

                --executor-memory 2G \

                --master local \

                --executor-cores 2 \

                --conf "spark.scheduler.mode=FIFO" \

                --conf "spark.executor.extraJavaOptions=-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps" \

                --jars
/home/hduser/jars/spark-streaming-kafka-assembly_2.10-1.6.1.jar \

                --class "${FILE_NAME}" \

                --conf "spark.ui.port=4040” \

                ${JAR_FILE} \

                >> ${LOG_FILE}

Now that does work fine although some of those parameters are implicit (for
example cheduler.mode = FIFOR or FAIR and I can start different spark jobs
in Local mode. Great for testing.

With regard to your comments on Standalone

Spark Standalone – a simple cluster manager included with Spark that
makes it easy to set up a cluster.

s/simple/built-in
What is stated as "included" implies that, i.e. it comes as part of running
Spark in standalone mode.

Your other points on YARN cluster mode and YARN client mode

I'd say there's only one YARN master, i.e. --master yarn. You could
however say where the driver runs, be it on your local machine where
you executed spark-submit or on one node in a YARN cluster.


Yes that is I believe what the text implied. I would be very surprised if
YARN as a resource manager relies on two masters :)


HTH









Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 19 June 2016 at 11:46, Jacek Laskowski <ja...@japila.pl> wrote:

> On Sun, Jun 19, 2016 at 12:30 PM, Mich Talebzadeh
> <mich.talebza...@gmail.com> wrote:
>
> > Spark Local - Spark runs on the local host. This is the simplest set up
> and
> > best suited for learners who want to understand different concepts of
> Spark
> > and those performing unit testing.
>
> There are also the less-common master URLs:
>
> * local[n, maxRetries] or local[*, maxRetries] — local mode with n
> threads and maxRetries number of failures.
> * local-cluster[n, cores, memory] for simulating a Spark local cluster
> with n workers, # cores per worker, and # memory per worker.
>
> As of Spark 2.0.0, you could also have your own scheduling system -
> see https://issues.apache.org/jira/browse/SPARK-13904 - with the only
> known implementation of the ExternalClusterManager contract in Spark
> being YarnClusterManager, i.e. whenever you call Spark with --master
> yarn.
>
> > Spark Standalone – a simple cluster manager included with Spark that
> makes
> > it easy to set up a cluster.
>
> s/simple/built-in
>
> > YARN Cluster Mode, the Spark driver runs inside an application master
> > process which is managed by YARN on the cluster, and the client can go
> away
> > after initiating the application. This is invoked with –master yarn and
> > --deploy-mode cluster
> >
> > YARN Client Mode, the driver runs in the client process, and the
> application
> > master is only used for requesting resources from YARN. Unlike Spark
> > standalone mode, in which the master’s address is specified in the
> --master
> > parameter, in YARN mode the ResourceManager’s address is picked up from
> the
> > Hadoop configuration. Thus, the --master parameter is yarn. This is
> invoked
> > with --deploy-mode client
>
> I'd say there's only one YARN master, i.e. --master yarn. You could
> however say where the driver runs, be it on your local machine where
> you executed spark-submit or on one node in a YARN cluster.
>
> The same applies to Spark Standalone and Mesos and is controlled by
> --deploy-mode, i.e. client (default) or cluster.
>
> Please update your notes accordingly ;-)
>
> Pozdrawiam,
> Jacek Laskowski
> ----
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>

Re: Running Spark in local mode

Reply via email to