Following up on this thread to see if anyone has some thoughts or opinions on 
the mentioned approach.


Guru Medasani
gdm...@gmail.com



> On Aug 3, 2015, at 10:20 PM, Guru Medasani <gdm...@gmail.com> wrote:
> 
> Hi,
> 
> I was looking at the spark-submit and spark-shell --help  on both (Spark 
> 1.3.1 and Spark 1.5-snapshot) versions and the Spark documentation for 
> submitting Spark applications to YARN. It seems to be there is some mismatch 
> in the preferred syntax and documentation. 
> 
> Spark documentation 
> <http://spark.apache.org/docs/latest/submitting-applications.html#master-urls>
>  says that we need to specify either yarn-cluster or yarn-client to connect 
> to a yarn cluster. 
> 
> 
> yarn-client   Connect to a YARN  
> <http://spark.apache.org/docs/latest/running-on-yarn.html>cluster in client 
> mode. The cluster location will be found based on the HADOOP_CONF_DIR or 
> YARN_CONF_DIR variable.
> yarn-cluster  Connect to a YARN  
> <http://spark.apache.org/docs/latest/running-on-yarn.html>cluster in cluster 
> mode. The cluster location will be found based on the HADOOP_CONF_DIR or 
> YARN_CONF_DIR variable.
> In the spark-submit --help it says the following Options: --master yarn 
> --deploy-mode cluster or client.
> 
> Usage: spark-submit [options] <app jar | python file> [app arguments]
> Usage: spark-submit --kill [submission ID] --master [spark://...] 
> <spark://...]>
> Usage: spark-submit --status [submission ID] --master [spark://...] 
> <spark://...]>
> 
> Options:
>   --master MASTER_URL         spark://host:port <spark://host:port>, 
> mesos://host:port <mesos://host:port>, yarn, or local.
>   --deploy-mode DEPLOY_MODE   Whether to launch the driver program locally 
> ("client") or
>                               on one of the worker machines inside the 
> cluster ("cluster")
>                               (Default: client).
> 
> I want to bring this to your attention as this is a bit confusing for someone 
> running Spark on YARN. For example, they look at the spark-submit help 
> command and start using the syntax, but when they look at online 
> documentation or user-group mailing list, they see different spark-submit 
> syntax. 
> 
> From a quick discussion with other engineers at Cloudera it seems like 
> —deploy-mode is preferred as it is more consistent with the way things are 
> done with other cluster managers, i.e. there is no standalone-cluster or 
> standalone-client masters. This applies to Mesos as well.
> 
> Either syntax works, but I would like to propose to use ‘-master yarn 
> —deploy-mode x’ instead of ‘-master yarn-cluster or -master yarn-client’ as 
> it is consistent with other cluster managers . This would require updating 
> all Spark pages related to submitting Spark applications to YARN.
> 
> So far I’ve identified the following pages.
> 
> 1) http://spark.apache.org/docs/latest/running-on-yarn.html 
> <http://spark.apache.org/docs/latest/running-on-yarn.html>
> 2) 
> http://spark.apache.org/docs/latest/submitting-applications.html#master-urls 
> <http://spark.apache.org/docs/latest/submitting-applications.html#master-urls>
> 
> There is a JIRA to track the progress on this as well.
> 
> https://issues.apache.org/jira/browse/SPARK-9570 
> <https://issues.apache.org/jira/browse/SPARK-9570>
>  
> The option we choose dictates whether we update the documentation  or 
> spark-submit and spark-shell help pages.  
> 
> Any thoughts which direction we should go? 
> 
> Guru Medasani
> gdm...@gmail.com <mailto:gdm...@gmail.com>
> 
> 
> 

Reply via email to