Re: Running Spark in Standalone or local modes

Mich Talebzadeh Sat, 11 Jun 2016 15:33:07 -0700

Hi Ashok

Your points:


"
I know I can start spark-shell by launching the shell itself

spark-shell

Now I know that in standalone mode I can also connect to master

spark-shell --master spark://<HOST>:7077

My point is what are the differences between these two start-up modes for
spark-shell? If I start spark-shell and connect to master what performance
gain will I get if any or it does not matter. Is it the same as for
spark-submit"

When you use spark-shell or for that matter spark-sql, *you are staring
spark-submit under the bonnet*. These two shells are created to make life
easier to work on Spark.

However, if you look at what $SPARK_HOME/bin/spark-shell do in the
script, you will notice my point:

    "${SPARK_HOME}"/bin/spark-submit --class org.apache.spark.repl.Main
--name "Spark shell" "$@"

So that is basically spark-submit JVM invoked with the name "Spark shell"

Since it is using spark-submit it takes all the parameters related to
spark-submit as described in here
<http://spark.apache.org/docs/latest/submitting-applications.html>

For example the default Web GUI for Spark is 4040. However, I start it with
55555 and modified it to call it a different name

"${SPARK_HOME}"/bin/spark-submit *--conf "spark.ui.port=55555"* --class
org.apache.spark.repl.Main --name *"my own Spark shell"* "$@"

On local mode (where you are not starting master and slaves/workers) the
application will try to grap all the available CPUs (in theory) unless you
restrict it with master local[n], you can see that in the GUI web page in
Environment tab as spark master local[n]. In this mode it is pretty simple
and you can run as many JVMs (spark-submit) as your resources allow. The
GUI starts by 4040, next one 4041 and so forth.

The crucial point is that by default Spark will deploy --master local mode.
 You can look at the resource usage through the GUI and also using OS tool
say "free" or something similar

In Standalone cluster mode *where Spark deploys its own scheduling, *you
start start-master and start-slaves (starts workers) and you end up with a
more distributed system with a number of worker processes on different
nodes using parallelism to speed up the process. This is in contrast to
"local" mode where it is all happening on the same physical host and your
best hope is using all the available cores.
Hence in summary by using Spark in standalone mode (actually this
terminology is a bit misleading, it would be better if they called it Spark
Own Scheduler Mode (OSM)),  you will have better performance due to
clustering nature of Spark.

HTH

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 11 June 2016 at 22:38, Gavin Yue <yue.yuany...@gmail.com> wrote:

> Sorry I have a typo.
>
> Which means spark does not use yarn or mesos in standalone mode...
>
>
>
> On Jun 11, 2016, at 14:35, Mich Talebzadeh <mich.talebza...@gmail.com>
> wrote:
>
> Hi Gavin,
>
> I believe in standalone mode a simple cluster manager is included with
> Spark that makes it easy to set up a cluster. It does not rely on YARN or
> Mesos.
>
> In summary this is from my notes:
>
>
>    -
>
>    Spark Local - Spark runs on the local host. This is the simplest set
>    up and best suited for learners who want to understand different concepts
>    of Spark and those performing unit testing.
>    -
>
>    Spark Standalone – a simple cluster manager included with Spark that
>    makes it easy to set up a cluster.
>    -
>
>    YARN Cluster Mode, the Spark driver runs inside an application master
>    process which is managed by YARN on the cluster, and the client can go away
>    after initiating the application.
>    -
>
>    Mesos. I have not used it so cannot comment
>
> YARN Client Mode, the driver runs in the client process, and the
> application master is only used for requesting resources from YARN. Unlike
> Local or Spark standalone modes, in which the master’s address is specified
> in the --master parameter, in YARN mode the ResourceManager’s address is
> picked up from the Hadoop configuration. Thus, the --master parameter is
> yarn
>
> HTH
>
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 11 June 2016 at 22:26, Gavin Yue <yue.yuany...@gmail.com> wrote:
>
>> The standalone mode is against Yarn mode or Mesos mode, which means spark
>> uses Yarn or Mesos as cluster managements.
>>
>> Local mode is actually a standalone mode which everything runs on the
>> single local machine instead of remote clusters.
>>
>> That is my understanding.
>>
>>
>> On Sat, Jun 11, 2016 at 12:40 PM, Ashok Kumar <
>> ashok34...@yahoo.com.invalid> wrote:
>>
>>> Thank you for grateful
>>>
>>> I know I can start spark-shell by launching the shell itself
>>>
>>> spark-shell
>>>
>>> Now I know that in standalone mode I can also connect to master
>>>
>>> spark-shell --master spark://<HOST>:7077
>>>
>>> My point is what are the differences between these two start-up modes
>>> for spark-shell? If I start spark-shell and connect to master what
>>> performance gain will I get if any or it does not matter. Is it the same as
>>> for spark-submit
>>>
>>>
>>> regards
>>>
>>>
>>> On Saturday, 11 June 2016, 19:39, Mohammad Tariq <donta...@gmail.com>
>>> wrote:
>>>
>>>
>>> Hi Ashok,
>>>
>>> In local mode all the processes run inside a single jvm, whereas in
>>> standalone mode we have separate master and worker processes running in
>>> their own jvms.
>>>
>>> To quickly test your code from within your IDE you could probable use
>>> the local mode. However, to get a real feel of how Spark operates I would
>>> suggest you to have a standalone setup as well. It's just the matter
>>> of launching a standalone cluster either manually(by starting a master and
>>> workers by hand), or by using the launch scripts provided with Spark
>>> package.
>>>
>>> You can find more on this *here*
>>> <http://spark.apache.org/docs/latest/spark-standalone.html>.
>>>
>>> HTH
>>>
>>>
>>>
>>> [image: http://]
>>>
>>> Tariq, Mohammad
>>> about.me/mti
>>> [image: http://]
>>> <http://about.me/mti>
>>>
>>>
>>> On Sat, Jun 11, 2016 at 11:38 PM, Ashok Kumar <
>>> ashok34...@yahoo.com.invalid> wrote:
>>>
>>> Hi,
>>>
>>> What is the difference between running Spark in Local mode or standalone
>>> mode?
>>>
>>> Are they the same. If they are not which is best suited for non prod
>>> work.
>>>
>>> I am also aware that one can run Spark in Yarn mode as well.
>>>
>>> Thanks
>>>
>>>
>>>
>>>
>>>
>>
>

Re: Running Spark in Standalone or local modes

Reply via email to