Hi all, I have created assembly jar from 1.2 snapshot source by running [1] which sets correct version of hadoop for our cluster and uses hive profile. I also have written relatively simple test program which starts by reading data from parquet using hive context. I compile the code against assembly jar created and then submited it on a cluster using by [2]. Job fails in its early stage on creating HiveContext itself. Important part of stack trace is [3].
Could please some of you explain what is wrong and how it should be fixed? I have found only SPARK-4532 (https://issues.apache.org/jira/browse/SPARK-4532) when looking for something related. Fix for the bug is merged in source I have used so this is ruled out... Thanks for help Jakub [1] ./sbt/sbt -Dhadoop.version=2.3.0-cdh5.1.3 -Pyarn -Phive assembly/ assembly [2] ./bin/spark-submit --num-executors 200 --master yarn-cluster --conf spark.yarn.jar=assembly/target/scala-2.10/spark-assembly-1.2.1-SNAPSHOT- hadoop2.3.0-cdh5.1.3.jar --class org.apache.spark.mllib. CreateGuidDomainDictionary root-0.1.jar ...some-args-here [3] 14/12/05 20:28:15 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: java.lang. RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore. HiveMetaStoreClient) Exception in thread "Driver" java.lang.RuntimeException: java.lang. RuntimeException: Unable to instantiate ... Caused by: java.lang.ClassNotFoundException: org.datanucleus.api.jdo. JDOPersistenceManagerFactory at java.net.URLClassLoader$1.run(URLClassLoader.java:366) ...