[ 
https://issues.apache.org/jira/browse/OOZIE-2547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156886#comment-16156886
 ] 

Sergey Zhemzhitsky commented on OOZIE-2547:
-------------------------------------------

Hello [~rkanter], [~rohini], [~satishsaley] I've noticed that the patch from 
this issue removes **determineSparkJarsAndClasspath** method introduced in 
OOZIE-2277 by [~rkanter]. 

Currently we are migrating our jobs from [CDH 
5.7|http://archive.cloudera.com/cdh5/cdh/5/oozie-4.1.0-cdh5.7.0.releasenotes.html]
 without this patch to CDH 5.12 that has this patch applied starting from [CDH 
5.10|http://archive.cloudera.com/cdh5/cdh/5/oozie-4.1.0-cdh5.10.0.releasenotes.html]
 and it seems that there is a regression, because all of our jobs which use 
hdfs api internally started to fail with the following error in the oozie 
launcher logs
{code}
Log Type: stderr
Log Upload Time: Thu Sep 07 11:43:40 +0300 2017
Log Length: 938
Error: A JNI error has occurred, please check your installation and try again
Exception in thread "main" java.lang.NoClassDefFoundError: 
org/apache/hadoop/conf/Configuration
        at java.lang.Class.getDeclaredMethods0(Native Method)
        at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
        at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
        at java.lang.Class.getMethod0(Class.java:3018)
        at java.lang.Class.getMethod(Class.java:1784)
        at 
sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544)
        at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526)
Caused by: java.lang.ClassNotFoundException: 
org.apache.hadoop.conf.Configuration
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        ... 7 more
Log Type: stdout
Log Upload Time: Thu Sep 07 11:43:40 +0300 2017
Log Length: 0
{code}
So it seems that this patch prevents oozie from fullfilling spark classpath 
correctly with hadoop libraries.
Could you please suggest how to provide spark job with 
hadoop-configuration.jar. Should it and all the necessary dependencies be 
placed within the lib directory of the workflow?

> Add mapreduce.job.cache.files to spark action
> ---------------------------------------------
>
>                 Key: OOZIE-2547
>                 URL: https://issues.apache.org/jira/browse/OOZIE-2547
>             Project: Oozie
>          Issue Type: Bug
>            Reporter: Satish Subhashrao Saley
>            Assignee: Satish Subhashrao Saley
>            Priority: Minor
>             Fix For: 4.3.0
>
>         Attachments: OOZIE-2547-1.patch, OOZIE-2547-4.patch, 
> OOZIE-2547-5.patch, yarn-cluster_launcher.txt
>
>
> Currently, we pass jars using --jars option while submitting spark job. Also, 
> we add spark.yarn.dist.files option in case of yarn-client mode. 
> Instead of that, we can have only --files option and pass on the files which 
> are present in mapreduce.job.cache.files. While doing so, we make sure that 
> spark won't make another copy of the files if files exist on the hdfs. We saw 
> the issues when files are getting copied multiple times and causing 
> exceptions such as :
> {code}
> Diagnostics: Resource 
> hdfs://localhost/user/saley/.sparkStaging/application_1234_123/oozie-examples.jar
>  changed on src filesystem
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to