[jira] [Updated] (PIG-4228) SchemaTupleBackend error when working on a Spark 1.1.0 cluster

liyunzhang_intel (JIRA) Tue, 28 Oct 2014 00:54:42 -0700

     [ 
https://issues.apache.org/jira/browse/PIG-4228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


liyunzhang_intel updated PIG-4228:
----------------------------------
    Attachment: groupby.pig
                movies_data.csv

> SchemaTupleBackend error when working on a Spark 1.1.0 cluster
> --------------------------------------------------------------
>
>                 Key: PIG-4228
>                 URL: https://issues.apache.org/jira/browse/PIG-4228
>             Project: Pig
>          Issue Type: Bug
>          Components: spark
>    Affects Versions: 0.14.0
>         Environment: spark-1.1.0
>            Reporter: Carlos Balduz
>              Labels: spark
>         Attachments: groupby.pig, movies_data.csv
>
>
> Whenever I try to run a script on a Spark cluster, I get the following error:
> ERROR 0: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 2 in stage 1.0 failed 4 times, most recent failure: Lost task 2.3 in 
> stage 1.0 (...): java.lang.RuntimeException: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Error while 
> executing ForEach at [1-2[-1,-1]]
>         
> org.apache.pig.backend.hadoop.executionengine.spark.converter.POOutputConsumerIterator.readNext(POOutputConsumerIterator.java:62)
>         
> org.apache.pig.backend.hadoop.executionengine.spark.converter.POOutputConsumerIterator.hasNext(POOutputConsumerIterator.java:68)
>         
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
>         scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388)
>         
> scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:29)
>         
> org.apache.pig.backend.hadoop.executionengine.spark.converter.POOutputConsumerIterator.readNext(POOutputConsumerIterator.java:34)
>         
> org.apache.pig.backend.hadoop.executionengine.spark.converter.POOutputConsumerIterator.hasNext(POOutputConsumerIterator.java:68)
>         
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
>         scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>         scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>         scala.collection.Iterator$class.foreach(Iterator.scala:727)
>         scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>         
> org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65)
>         
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>         
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>         org.apache.spark.scheduler.Task.run(Task.scala:54)
>         org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
>         
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         java.lang.Thread.run(Thread.java:745)
> After debugging I have seen that the problem is inside SchemaTupleBackend. 
> Although SparkLauncher initializes this class, when the job gets sent to the 
> executors this is lost and when POOutputConsumerIterator tries to fetch the 
> results, SchemaTupleBackend.newSchemaTupleFactory(...) is called, throwing a 
> RuntimeException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (PIG-4228) SchemaTupleBackend error when working on a Spark 1.1.0 cluster

Reply via email to