[ https://issues.apache.org/jira/browse/OOZIE-2547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156886#comment-16156886 ]
Sergey Zhemzhitsky commented on OOZIE-2547: ------------------------------------------- Hello [~rkanter], [~rohini], [~satishsaley] I've noticed that the patch from this issue removes **determineSparkJarsAndClasspath** method introduced in OOZIE-2277 by [~rkanter]. Currently we are migrating our jobs from [CDH 5.7|http://archive.cloudera.com/cdh5/cdh/5/oozie-4.1.0-cdh5.7.0.releasenotes.html] without this patch to CDH 5.12 that has this patch applied starting from [CDH 5.10|http://archive.cloudera.com/cdh5/cdh/5/oozie-4.1.0-cdh5.10.0.releasenotes.html] and it seems that there is a regression, because all of our jobs which use hdfs api internally started to fail with the following error in the oozie launcher logs {code} Log Type: stderr Log Upload Time: Thu Sep 07 11:43:40 +0300 2017 Log Length: 938 Error: A JNI error has occurred, please check your installation and try again Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration at java.lang.Class.getDeclaredMethods0(Native Method) at java.lang.Class.privateGetDeclaredMethods(Class.java:2701) at java.lang.Class.privateGetMethodRecursive(Class.java:3048) at java.lang.Class.getMethod0(Class.java:3018) at java.lang.Class.getMethod(Class.java:1784) at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544) at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.conf.Configuration at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 7 more Log Type: stdout Log Upload Time: Thu Sep 07 11:43:40 +0300 2017 Log Length: 0 {code} So it seems that this patch prevents oozie from fullfilling spark classpath correctly with hadoop libraries. Could you please suggest how to provide spark job with hadoop-configuration.jar. Should it and all the necessary dependencies be placed within the lib directory of the workflow? > Add mapreduce.job.cache.files to spark action > --------------------------------------------- > > Key: OOZIE-2547 > URL: https://issues.apache.org/jira/browse/OOZIE-2547 > Project: Oozie > Issue Type: Bug > Reporter: Satish Subhashrao Saley > Assignee: Satish Subhashrao Saley > Priority: Minor > Fix For: 4.3.0 > > Attachments: OOZIE-2547-1.patch, OOZIE-2547-4.patch, > OOZIE-2547-5.patch, yarn-cluster_launcher.txt > > > Currently, we pass jars using --jars option while submitting spark job. Also, > we add spark.yarn.dist.files option in case of yarn-client mode. > Instead of that, we can have only --files option and pass on the files which > are present in mapreduce.job.cache.files. While doing so, we make sure that > spark won't make another copy of the files if files exist on the hdfs. We saw > the issues when files are getting copied multiple times and causing > exceptions such as : > {code} > Diagnostics: Resource > hdfs://localhost/user/saley/.sparkStaging/application_1234_123/oozie-examples.jar > changed on src filesystem > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)