Carlos Balduz created PIG-4228:
----------------------------------
Summary: SchemaTupleBackend error when working on a Spark 1.1.0
cluster
Key: PIG-4228
URL: https://issues.apache.org/jira/browse/PIG-4228
Project: Pig
Issue Type: Bug
Components: spark
Affects Versions: 0.14.0
Environment: spark-1.1.0
Reporter: Carlos Balduz
Whenever I try to run a script on a Spark cluster, I get the following error:
ERROR 0: org.apache.spark.SparkException: Job aborted due to stage failure:
Task 2 in stage 1.0 failed 4 times, most recent failure: Lost task 2.3 in stage
1.0 (...): java.lang.RuntimeException:
org.apache.pig.backend.executionengine.ExecException: ERROR 0: Error while
executing ForEach at [1-2[-1,-1]]
org.apache.pig.backend.hadoop.executionengine.spark.converter.POOutputConsumerIterator.readNext(POOutputConsumerIterator.java:62)
org.apache.pig.backend.hadoop.executionengine.spark.converter.POOutputConsumerIterator.hasNext(POOutputConsumerIterator.java:68)
scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388)
scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:29)
org.apache.pig.backend.hadoop.executionengine.spark.converter.POOutputConsumerIterator.readNext(POOutputConsumerIterator.java:34)
org.apache.pig.backend.hadoop.executionengine.spark.converter.POOutputConsumerIterator.hasNext(POOutputConsumerIterator.java:68)
scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
scala.collection.Iterator$class.foreach(Iterator.scala:727)
scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65)
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
org.apache.spark.scheduler.Task.run(Task.scala:54)
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:745)
After debugging I have seen that the problem is inside SchemaTupleBackend.
Although SparkLauncher initializes this class, when the job gets sent to the
executors this is lost and when POOutputConsumerIterator tries to fetch the
results, SchemaTupleBackend.newSchemaTupleFactory(...) is called, throwing a
RuntimeException.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)