[jira] [Created] (PIG-4228) SchemaTupleBackend error when working on a Spark 1.1.0 cluster

Carlos Balduz (JIRA) Fri, 10 Oct 2014 03:20:08 -0700

Carlos Balduz created PIG-4228:
----------------------------------

             Summary: SchemaTupleBackend error when working on a Spark 1.1.0 
cluster
                 Key: PIG-4228
                 URL: https://issues.apache.org/jira/browse/PIG-4228
             Project: Pig
          Issue Type: Bug
          Components: spark
    Affects Versions: 0.14.0
         Environment: spark-1.1.0
            Reporter: Carlos Balduz



Whenever I try to run a script on a Spark cluster, I get the following error:

ERROR 0: org.apache.spark.SparkException: Job aborted due to stage failure: 
Task 2 in stage 1.0 failed 4 times, most recent failure: Lost task 2.3 in stage 
1.0 (...): java.lang.RuntimeException: 
org.apache.pig.backend.executionengine.ExecException: ERROR 0: Error while 
executing ForEach at [1-2[-1,-1]]
        
org.apache.pig.backend.hadoop.executionengine.spark.converter.POOutputConsumerIterator.readNext(POOutputConsumerIterator.java:62)
        
org.apache.pig.backend.hadoop.executionengine.spark.converter.POOutputConsumerIterator.hasNext(POOutputConsumerIterator.java:68)
        
scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
        scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388)
        
scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:29)
        
org.apache.pig.backend.hadoop.executionengine.spark.converter.POOutputConsumerIterator.readNext(POOutputConsumerIterator.java:34)
        
org.apache.pig.backend.hadoop.executionengine.spark.converter.POOutputConsumerIterator.hasNext(POOutputConsumerIterator.java:68)
        
scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
        scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
        scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
        scala.collection.Iterator$class.foreach(Iterator.scala:727)
        scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
        
org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65)
        
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
        
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
        org.apache.spark.scheduler.Task.run(Task.scala:54)
        org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
        
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        java.lang.Thread.run(Thread.java:745)


After debugging I have seen that the problem is inside SchemaTupleBackend. 
Although SparkLauncher initializes this class, when the job gets sent to the 
executors this is lost and when POOutputConsumerIterator tries to fetch the 
results, SchemaTupleBackend.newSchemaTupleFactory(...) is called, throwing a 
RuntimeException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (PIG-4228) SchemaTupleBackend error when working on a Spark 1.1.0 cluster

Reply via email to