Following up on this thread to see if anyone has some thoughts or opinions on the mentioned approach.
Guru Medasani gdm...@gmail.com > On Aug 3, 2015, at 10:20 PM, Guru Medasani <gdm...@gmail.com> wrote: > > Hi, > > I was looking at the spark-submit and spark-shell --help on both (Spark > 1.3.1 and Spark 1.5-snapshot) versions and the Spark documentation for > submitting Spark applications to YARN. It seems to be there is some mismatch > in the preferred syntax and documentation. > > Spark documentation > <http://spark.apache.org/docs/latest/submitting-applications.html#master-urls> > says that we need to specify either yarn-cluster or yarn-client to connect > to a yarn cluster. > > > yarn-client Connect to a YARN > <http://spark.apache.org/docs/latest/running-on-yarn.html>cluster in client > mode. The cluster location will be found based on the HADOOP_CONF_DIR or > YARN_CONF_DIR variable. > yarn-cluster Connect to a YARN > <http://spark.apache.org/docs/latest/running-on-yarn.html>cluster in cluster > mode. The cluster location will be found based on the HADOOP_CONF_DIR or > YARN_CONF_DIR variable. > In the spark-submit --help it says the following Options: --master yarn > --deploy-mode cluster or client. > > Usage: spark-submit [options] <app jar | python file> [app arguments] > Usage: spark-submit --kill [submission ID] --master [spark://...] > <spark://...]> > Usage: spark-submit --status [submission ID] --master [spark://...] > <spark://...]> > > Options: > --master MASTER_URL spark://host:port <spark://host:port>, > mesos://host:port <mesos://host:port>, yarn, or local. > --deploy-mode DEPLOY_MODE Whether to launch the driver program locally > ("client") or > on one of the worker machines inside the > cluster ("cluster") > (Default: client). > > I want to bring this to your attention as this is a bit confusing for someone > running Spark on YARN. For example, they look at the spark-submit help > command and start using the syntax, but when they look at online > documentation or user-group mailing list, they see different spark-submit > syntax. > > From a quick discussion with other engineers at Cloudera it seems like > —deploy-mode is preferred as it is more consistent with the way things are > done with other cluster managers, i.e. there is no standalone-cluster or > standalone-client masters. This applies to Mesos as well. > > Either syntax works, but I would like to propose to use ‘-master yarn > —deploy-mode x’ instead of ‘-master yarn-cluster or -master yarn-client’ as > it is consistent with other cluster managers . This would require updating > all Spark pages related to submitting Spark applications to YARN. > > So far I’ve identified the following pages. > > 1) http://spark.apache.org/docs/latest/running-on-yarn.html > <http://spark.apache.org/docs/latest/running-on-yarn.html> > 2) > http://spark.apache.org/docs/latest/submitting-applications.html#master-urls > <http://spark.apache.org/docs/latest/submitting-applications.html#master-urls> > > There is a JIRA to track the progress on this as well. > > https://issues.apache.org/jira/browse/SPARK-9570 > <https://issues.apache.org/jira/browse/SPARK-9570> > > The option we choose dictates whether we update the documentation or > spark-submit and spark-shell help pages. > > Any thoughts which direction we should go? > > Guru Medasani > gdm...@gmail.com <mailto:gdm...@gmail.com> > > >