[ https://issues.apache.org/jira/browse/PIG-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16079868#comment-16079868 ]
liyunzhang_intel commented on PIG-5157: --------------------------------------- [~nkollar]: sorry to reply so late. here is the result after solving the exception i mentioned last time in spark1 with yarn-client mode: {noformat} export SPARK_JAR=hdfs://zly1.sh.intel.com:8020/user/root/spark-assembly-1.6.1-hadoop2.6.0.jar export SPARK_HOME=$SPARK161 #donwload the spark1.6.1 export HADOOP_USER_CLASSPATH_FIRST="true" $PIG_HOME/bin/pig -x spark $PIG_HOME/bin/testJoin.pig {noformat} pig.properties {noformat} pig.sort.readonce.loadfuncs=org.apache.pig.backend.hadoop.hbase.HBaseStorage,org.apache.pig.backend.hadoop.accumulo.AccumuloStorage spark.master=yarn-client {noformat} testJoin.pig {code} A = load './SkewedJoinInput1.txt' as (id,name,n); B = load './SkewedJoinInput2.txt' as (id,name); D = join A by (id,name), B by (id,name) parallel 10; store D into './testJoin.out'; {code} the script fails to generate result and exception found in the log is {noformat} [task-result-getter-0] 2017-07-10 12:16:45,667 WARN scheduler.TaskSetManager (Logging.scala:logWarning(70)) - Lost task 0.0 in stage 0.0 (TID 0, zly1.sh.intel.com): java.lang.IllegalStateException: unread block data at java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2424) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1383) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:194) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {noformat} can you verify whether there is the same problem your cluster in yarn-client mode(in my cluster, it passed on local mode but failed in yarn-client mode)? the error seems like a problem about datanode but i verified the environment with spark branch code and it passed. So I guess the problem is caused by the patch. > Upgrade to Spark 2.0 > -------------------- > > Key: PIG-5157 > URL: https://issues.apache.org/jira/browse/PIG-5157 > Project: Pig > Issue Type: Improvement > Components: spark > Reporter: Nandor Kollar > Assignee: Nandor Kollar > Fix For: 0.18.0 > > Attachments: PIG-5157.patch > > > Upgrade to Spark 2.0 (or latest) -- This message was sent by Atlassian JIRA (v6.4.14#64029)