[ https://issues.apache.org/jira/browse/PIG-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057075#comment-16057075 ]
liyunzhang_intel commented on PIG-5157: --------------------------------------- [~nkollar]: after using the patch and test a simple query in yarn-client env. build jar: {noformat}ant clean -v -Dhadoopversion=2 jar-spark12{noformat} testJoin.pig {code} A = load './SkewedJoinInput1.txt' as (id,name,n); B = load './SkewedJoinInput2.txt' as (id,name); D = join A by (id,name), B by (id,name) parallel 10; store D into './testJoin.out'; {code} spark1: export SPARK_HOME=xxxx export export SPARK_JAR=hdfs://xxxx:8020/user/root/spark-assembly-1.6.1-hadoop2.6.0.jar $PIG_HOME/bin/pig -x spark -logfile $PIG_HOME/logs/pig.log testJoin.pig error in logs/pig {noformat} java.lang.NoClassDefFoundError: org/apache/spark/scheduler/SparkListenerInterface at org.apache.pig.backend.hadoop.executionengine.spark.SparkExecutionEngine.<init>(SparkExecutionEngine.java:35) at org.apache.pig.backend.hadoop.executionengine.spark.SparkExecType.getExecutionEngine(SparkExecType.java:42) at org.apache.pig.impl.PigContext.<init>(PigContext.java:269) at org.apache.pig.impl.PigContext.<init>(PigContext.java:256) at org.apache.pig.Main.run(Main.java:389) at org.apache.pig.Main.main(Main.java:175) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Caused by: java.lang.ClassNotFoundException: org.apache.spark.scheduler.SparkListenerInterface at java.net.URLClassLoader$1.run(URLClassLoader.java:372) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:360) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 12 more {noformat} spark2( patch PIG-5246_2.patch) export SPARK_HOME=xxxx $PIG_HOME/bin/pig -x spark -logfile $PIG_HOME/logs/pig.log testJoin.pig error in logs/pig {noformat} [main] 2017-06-21 14:14:05,791 ERROR spark.JobGraphBuilder (JobGraphBuilder.java:sparkOperToRDD(187)) - throw exception in sparkOperToRDD: org.apache.spark.SparkException: Task not serializable at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:298) at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:288) at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:108) at org.apache.spark.SparkContext.clean(SparkContext.scala:2037) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:763) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:762) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:358) at org.apache.spark.rdd.RDD.mapPartitions(RDD.scala:762) at org.apache.spark.api.java.JavaRDDLike$class.mapPartitions(JavaRDDLike.scala:166) at org.apache.spark.api.java.AbstractJavaRDDLike.mapPartitions(JavaRDDLike.scala:45) at org.apache.pig.backend.hadoop.executionengine.spark.converter.ForEachConverter.convert(ForEachConverter.java:64) at org.apache.pig.backend.hadoop.executionengine.spark.converter.ForEachConverter.convert(ForEachConverter.java:45) at org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.physicalToRDD(JobGraphBuilder.java:292) at org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.physicalToRDD(JobGraphBuilder.java:248) at org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.physicalToRDD(JobGraphBuilder.java:248) at org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.physicalToRDD(JobGraphBuilder.java:248) at org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.sparkOperToRDD(JobGraphBuilder.java:182) at org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:112) at org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:140) at org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:37) at org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:87) at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46) at org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.launchPig(SparkLauncher.java:233) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:290) at org.apache.pig.PigServer.launchPlan(PigServer.java:1475) {noformat} will investigate the reason but please retest it in your env. If there is misunderstanding , please tell me. > Upgrade to Spark 2.0 > -------------------- > > Key: PIG-5157 > URL: https://issues.apache.org/jira/browse/PIG-5157 > Project: Pig > Issue Type: Improvement > Components: spark > Reporter: Nandor Kollar > Assignee: Nandor Kollar > Fix For: 0.18.0 > > Attachments: PIG-5157.patch > > > Upgrade to Spark 2.0 (or latest) -- This message was sent by Atlassian JIRA (v6.4.14#64029)