Re: Spark on yarn vs spark standalone

Mark Hamstra Mon, 30 Nov 2015 10:00:19 -0800

Standalone mode also supports running the driver on a cluster node.  See
"cluster" mode in
http://spark.apache.org/docs/latest/spark-standalone.html#launching-spark-applications
.  Also,
http://spark.apache.org/docs/latest/spark-standalone.html#high-availability


On Mon, Nov 30, 2015 at 9:47 AM, Jacek Laskowski <ja...@japila.pl> wrote:

> Hi,
>
> My understanding of Spark on YARN and even Spark in general is very
> limited so keep that in mind.
>
> I'm not sure why you compare yarn-cluster and spark standalone? In
> yarn-cluster a driver runs on a node in the YARN cluster while spark
> standalone keeps the driver on the machine you launched a Spark
> application. Also, YARN cluster supports retrying applications while
> standalone doesn't. There's also support for rack locality preference
> (but dunno if that's used and where in Spark).
>
> My limited understanding suggests me to use Spark on YARN if you're
> considering to use Hadoop/HDFS and submitting jobs using YARN.
> Standalone's an entry option where throwing in YARN could kill
> introducing Spark to organizations without Hadoop YARN.
>
> Just my two cents.
>
> Pozdrawiam,
> Jacek
>
> --
> Jacek Laskowski | https://medium.com/@jaceklaskowski/ |
> http://blog.jaceklaskowski.pl
> Mastering Spark https://jaceklaskowski.gitbooks.io/mastering-apache-spark/
> Follow me at https://twitter.com/jaceklaskowski
> Upvote at http://stackoverflow.com/users/1305344/jacek-laskowski
>
>
> On Fri, Nov 27, 2015 at 8:36 AM, cs user <acldstk...@gmail.com> wrote:
> > Hi All,
> >
> > Apologies if this question has been asked before. I'd like to know if
> there
> > are any downsides to running spark over yarn with the --master
> yarn-cluster
> > option vs having a separate spark standalone cluster to execute jobs?
> >
> > We're looking at installing a hdfs/hadoop cluster with Ambari and
> submitting
> > jobs to the cluster using yarn, or having an Ambari cluster and a
> separate
> > standalone spark cluster, which will run the spark jobs on data within
> hdfs.
> >
> > With yarn, will we still get all the benefits of spark?
> >
> > Will it be possible to process streaming data?
> >
> > Many thanks in advance for any responses.
> >
> > Cheers!
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Re: Spark on yarn vs spark standalone

Reply via email to