[ 
https://issues.apache.org/jira/browse/SPARK-4461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221502#comment-14221502
 ] 

Zhan Zhang edited comment on SPARK-4461 at 11/21/14 10:23 PM:
--------------------------------------------------------------

Thanks for the information Marcelo. I changed the title to reflect the change. 
It handles a different issue. But the PR you referred should also be fixed.

Currently, there is no way to pass yarn am specific java options. It cause some 
potential issues when reading classpath from hadoop configuration file. Hadoop 
configuration actually replace variables in its property with the system 
property passed in java options. How to specify the value depends on different 
hadoop distribution.

The new options are SPARK_YARN_JAVA_OPTS or spark.yarn.extraJavaOptions. I make 
it as spark global level, because typically we don't want user to specify this 
in their command line each time submitting spark job after it is setup in 
spark-defaults.conf.

In addition, with this new extra options enabled to be passed to AM, it 
provides more flexibility. How to specify the value

For example int the following valid mapred-site.xml file, we have the class 
path which specify values using system property. Hadoop can correctly handle it 
because it has java options passed in.


mapreduce.application.classpath
/etc/hadoop/${hadoop.version}/mapreduce/*
In the meantime, we cannot relies on mapreduce.admin.map.child.java.opts in 
mapred-site.xml, because it has its own extra java options specified, which 
does not apply to Spark.


was (Author: zzhan):
Thanks for the information Marcelo. I changed the title to reflect the change.

> Pass java options to yarn master to handle system properties correctly.
> -----------------------------------------------------------------------
>
>                 Key: SPARK-4461
>                 URL: https://issues.apache.org/jira/browse/SPARK-4461
>             Project: Spark
>          Issue Type: Bug
>          Components: YARN
>            Reporter: Zhan Zhang
>
> Currently spark read mapred-site.xml to get the class path. From hadoop 2.6, 
> the library is shipped to cluster with distributed cache at run-time, and may 
> not be available at every node manager. 
> Instead of relying on mapred-site.xml, spark should handle this by its own, 
> for example, through ADD_JARs, SPARK_CLASSPATH, etc



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to