Re: How to diagnose "could not compute split" errors and failed jobs?

Tathagata Das Mon, 23 Feb 2015 00:16:58 -0800

Could you find the executor logs on the executor where that task was
scheduled? That may provide more information on what caused the error.
Also take a look at where the block in question was stored, and where the
task was scheduled.
You will need to enabled log4j INFO level logs for this debugging.


TD

On Thu, Feb 19, 2015 at 10:38 PM, Akhil Das <ak...@sigmoidanalytics.com>
wrote:

> Not quiet sure, but this can be the case. One of your executor is stuck on
> GC pause while the other one asks for the data from it and hence the
> request timesout ending in that exception. You can try increasing the akk
> framesize and ack wait timeout as follows:
>
>       .set("spark.core.connection.ack.wait.timeout","600")      
> .set("spark.akka.frameSize","50")
>
>
> Thanks
> Best Regards
>
> On Fri, Feb 20, 2015 at 6:21 AM, Tim Smith <secs...@gmail.com> wrote:
>
>> My streaming app runs fine for a few hours and then starts spewing "Could
>> not compute split, block input-xx-xxxxxxx not found" errors. After this,
>> jobs start to fail and batches start to pile up.
>>
>> My question isn't so much about why this error but rather, how do I trace
>> what leads to this error? I am using disk+memory for storage so shouldn't
>> be a case of data loss resulting from memory overrun.
>>
>> 15/02/18 22:04:49 ERROR JobScheduler: Error running job streaming job
>> 1424297050000 ms.28
>> org.apache.spark.SparkException: Job aborted due to stage failure: Task 3
>> in stage 247644.0 failed 64 times, most recent failure: Lost task 3.63 in
>> stage 247644.0 (TID 3705290, node-dn1-16-test.abcdefg.com):
>> java.lang.Exception: Could not compute split, block input-28-1424297042500
>> not found
>>         at org.apache.spark.rdd.BlockRDD.compute(BlockRDD.scala:51)
>>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>>         at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
>>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>>         at
>> org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:61)
>>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:228)
>>         at
>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
>>         at org.apache.spark.scheduler.Task.run(Task.scala:56)
>>         at
>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
>>         at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>         at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>         at java.lang.Thread.run(Thread.java:745)
>>
>> Driver stacktrace:
>>         at org.apache.spark.scheduler.DAGScheduler.org
>> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1214)
>>         at
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1203)
>>         at
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1202)
>>         at
>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>>         at
>> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>>         at
>> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1202)
>>         at
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:696)
>>         at
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:696)
>>         at scala.Option.foreach(Option.scala:236)
>>         at
>> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:696)
>>         at
>> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1420)
>>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>>         at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>>         at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>>         at
>> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>>         at
>> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>         at
>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>>         at
>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>         at
>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>
>> Thanks,
>>
>> Tim
>>
>>
>

Re: How to diagnose "could not compute split" errors and failed jobs?

Reply via email to