Sergey Zhemzhitsky created OOZIE-3053:
-----------------------------------------

             Summary: Oozie does not honor classpath.txt when running Spark 
action
                 Key: OOZIE-3053
                 URL: https://issues.apache.org/jira/browse/OOZIE-3053
             Project: Oozie
          Issue Type: Bug
          Components: action
    Affects Versions: 4.3.0, 4.2.0, 5.0.0
         Environment: # Cloudera Quick Start VM 5.12.1
# [CDH Oozie 
4.1|http://archive.cloudera.com/cdh5/cdh/5/oozie-4.1.0-cdh5.12.1.releasenotes.html]
 - actually it includes all 4.2, 4.3, 5.0 patches applied
# Spark 1.6, Spark 2.2
# /etc/oozie/oozie-site.xml
... added
{code:xml}
<property>
  <name>oozie.service.SparkConfigurationService.spark.configurations</name>
  <value>*=/etc/spark/conf</value>
</property>
{code}
# /etc/spark/conf/spark-defaults.conf
... added
{code}
spark.hadoop.mapreduce.application.classpath=
spark.hadoop.yarn.application.classpath=
{code}
# workflow.xml
{code:xml}
<workflow-app name="My Workflow" xmlns="uri:oozie:workflow:0.5">
    <start to="spark-0ff5"/>
    <kill name="Kill">
        <message>Action failed, error 
message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <action name="spark-0ff5">
        <spark xmlns="uri:oozie:spark-action:0.2">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <master>yarn</master>
            <mode>cluster</mode>
            <name>MySpark</name>
              <class>org.apache.oozie.example.SparkFileCopy</class>
            <jar>oozie-examples-4.3.0.jar</jar>
              <arg>/user/cloudera/spark-oozie/input</arg>
              <arg>/user/cloudera/spark-oozie/output</arg>
            
<file>/user/cloudera/spark-oozie-examples/oozie-examples-4.3.0.jar#oozie-examples-4.3.0.jar</file>
        </spark>
        <ok to="End"/>
        <error to="Kill"/>
    </action>
    <end name="End"/>
</workflow-app>
{code}
            Reporter: Sergey Zhemzhitsky


Currently CDH distribution of Hadoop configures Spark2 with empty 
*spark.hadoop.mapreduce.application.classpath*, 
*spark.hadoop.yarn.application.classpath* properties in 
*/etc/spark2/conf/spark-defaults.properties*
{code}
spark.hadoop.mapreduce.application.classpath=
spark.hadoop.yarn.application.classpath=
{code}
The motivation for such a configuration is described in Spark2 Parcel 
installation scripts 
(_SPARK2\_ON\_YARN\-2.2.0.cloudera1.jar!/scripts/common.sh_)
{code}
# Override the YARN / MR classpath configs since we already include them when 
generating
# SPARK_DIST_CLASSPATH. This avoids having the same paths added to the 
classpath a second
# time and wasting file descriptors.
replace_spark_conf "spark.hadoop.mapreduce.application.classpath" "" 
"$SPARK_DEFAULTS"
replace_spark_conf "spark.hadoop.yarn.application.classpath" "" 
"$SPARK_DEFAULTS"
{code}

So when configuring Oozie to run with spark-defaults.properties from Spark2, 
Oozie's actions usually throw an exception
{code}
Exception in thread "main" java.lang.NoClassDefFoundError: 
org/apache/hadoop/conf/Configuration
        at java.lang.Class.getDeclaredMethods0(Native Method)
        at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
        at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
        at java.lang.Class.getMethod0(Class.java:3018)
        at java.lang.Class.getMethod(Class.java:1784)
        at 
sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544)
        at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526)
Caused by: java.lang.ClassNotFoundException: 
org.apache.hadoop.conf.Configuration
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
{code}
The same exception is also thrown in case of Spark 1.6 if there are empty 
*spark.hadoop.mapreduce.application.classpath*, 
*spark.hadoop.yarn.application.classpath* in 
*/etc/spark/conf/spark-defaults.properties*

Probably OOZIE-2547 is the cause of this issue.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to