Hi everyone,

I ran a job this morning on 30 wally nodes. DOP 224. Worked like a charm.

Then, I ran a similar job, on the exact same configuration, on the same
input data set. The only difference between them is that the second job
computes the degrees per vertex and, for vertices with degree higher than a
user-defined threshold, it does a bit of magic(roughly a bunch of
coGroups). The problem is that, even before the extra functions get called,
I get the following type of exception:

06/19/2015 12:06:43     CHAIN FlatMap (FlatMap at
fromDataSet(Graph.java:171)) -> Combine(Distinct at
fromDataSet(Graph.java:171))(222/224) switched to FAILED
java.lang.IllegalStateException: Update task on instance
29073fb0b0957198a2b67569b042d56b @ wally004 - 8 slots - URL: akka.tcp://
flink@130.149.249.14:44528/user/taskmanager failed due to:
        at
org.apache.flink.runtime.executiongraph.Execution$5.onFailure(Execution.java:860)
        at akka.dispatch.OnFailure.internal(Future.scala:228)
        at akka.dispatch.OnFailure.internal(Future.scala:227)
        at akka.dispatch.japi$CallbackBridge.apply(Future.scala:174)
        at akka.dispatch.japi$CallbackBridge.apply(Future.scala:171)
        at
scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
        at
scala.runtime.AbstractPartialFunction.applyOrElse(AbstractPartialFunction.scala:25)
        at
scala.concurrent.Future$$anonfun$onFailure$1.apply(Future.scala:136)
        at
scala.concurrent.Future$$anonfun$onFailure$1.apply(Future.scala:134)
        at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
        at
scala.concurrent.impl.ExecutionContextImpl$$anon$3.exec(ExecutionContextImpl.scala:107)
        at
scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: akka.pattern.AskTimeoutException: Ask timed out on
[Actor[akka.tcp://flink@130.149.249.14:44528/user/taskmanager#82700874]]
after [100000 ms]
        at
akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:333)
        at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117)
        at
scala.concurrent.Future$InternalCallbackExecutor$.scala$concurrent$Future$InternalCallbackExecutor$$unbatchedExecute(Future.scala:694)
        at
scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:691)
        at
akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:467)
        at
akka.actor.LightArrayRevolverScheduler$$anon$8.executeBucket$1(Scheduler.scala:419)
        at
akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:423)
        at
akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:375)
        at java.lang.Thread.run(Thread.java:722)


 At first I thought, okay maybe wally004 is down; then I ssh'd into it.
Works fine.

The full output can be found here:
https://gist.github.com/andralungu/d222b75cb33aea57955d

Does anyone have any idea about what may have triggered this? :(

Thanks!
Andra

Reply via email to