[
https://issues.apache.org/jira/browse/OOZIE-2606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15373706#comment-15373706
]
Satish Subhashrao Saley commented on OOZIE-2606:
------------------------------------------------
Currently, we pass on the jars using {{-- files}} options and archives using
{{-- archives}} option. We pass in the hdfs paths in both. [Here is code for it
|https://github.com/apache/oozie/blob/master/sharelib/spark/src/main/java/org/apache/oozie/action/hadoop/SparkMain.java#L175-L183].
I have few questions questions regarding {{spark.yarn.jars}}.
- Is it replacement for {{--files}} ? It does not look like that based on this
[part of the code
|https://github.com/apache/spark/blob/bad0f7dbba2eda149ee4fc5810674d971d17874a/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L495-L504]
The files inside SPARK_JARS i.e. {{spark.yarn.jars}} gets distributed only when
we haven't defined SPARK_ARCHIVE.
{code}
val sparkArchive = sparkConf.get(SPARK_ARCHIVE)
if (sparkArchive.isDefined) {
val archive = sparkArchive.get
require(!isLocalUri(archive), s"${SPARK_ARCHIVE.key} cannot be a local
URI.")
distribute(Utils.resolveURI(archive).toString,
resType = LocalResourceType.ARCHIVE,
destName = Some(LOCALIZED_LIB_DIR))
} else {
sparkConf.get(SPARK_JARS) match {
case Some(jars) =>
{code}
- Is {{spark.yarn.jars}} as a replacement of {{spark.yarn.jar}} with some
additional functionality?
Currently, we can set {{spark.yarn.jar}} to the spark-assembly.jar in case
overriding the default location.
{code}
http://spark.apache.org/docs/latest/running-on-yarn.html
The location of the Spark jar file, in case overriding the default location is
desired. By default, Spark on YARN will use a Spark jar installed locally, but
the Spark jar can also be in a world-readable location on HDFS. This allows
YARN to cache it on nodes so that it doesn't need to be distributed each time
an application runs. To point to a jar on HDFS, for example, set this
configuration to hdfs:///some/path.
{code}
I was about to file a jira for setting {{spark.yarn.jar}} to spark-assembly.jar
because currently, spark-assembly.jar is getting distributed multiple times.
Let me know, shall we add fix for this in here?
> Set spark.yarn.jars to fix Spark 2.0 with Oozie
> -----------------------------------------------
>
> Key: OOZIE-2606
> URL: https://issues.apache.org/jira/browse/OOZIE-2606
> Project: Oozie
> Issue Type: Bug
> Components: core
> Affects Versions: 4.2.0
> Reporter: Jonathan Kelly
> Labels: spark, spark2.0.0
> Fix For: trunk
>
> Attachments: OOZIE-2606.patch
>
>
> Oozie adds all of the jars in the Oozie Spark sharelib to the
> DistributedCache such that all jars will be present in the current working
> directory of the YARN container (as well as in the container classpath).
> However, this is not quite enough to make Spark 2.0 work, since Spark 2.0 by
> default looks for the jars in assembly/target/scala-2.11/jars [1] (as if it
> is a locally built distribution for development) and will not find them in
> the current working directory.
> To fix this, we can set spark.yarn.jars to *.jar so that it finds the jars in
> the current working directory rather than looking in the wrong place. [2]
> [1]
> https://github.com/apache/spark/blob/v2.0.0-rc2/launcher/src/main/java/org/apache/spark/launcher/CommandBuilderUtils.java#L357
> [2]
> https://github.com/apache/spark/blob/v2.0.0-rc2/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L476
> Note: This property will be ignored by Spark 1.x.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)