[ 
https://issues.apache.org/jira/browse/SPARK-9820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14681913#comment-14681913
 ] 

Kevin Cox commented on SPARK-9820:
----------------------------------

For our jobs the frequency of this exception has been increasing over time. It 
hasn't caused errors until we enabled dynamic allocation though.

!Frequency.png!

Note that this is the "ERROR AkkaRpcEnv: Ignore error: null" exception not the 
"CoarseGrainedExecutorBackend: Cannot register with driver: 
akka.tcp://[email protected]:47462/user/CoarseGrainedScheduler" from 
SPARK-8592.

> NullPointerException that causes failure to request executors.
> --------------------------------------------------------------
>
>                 Key: SPARK-9820
>                 URL: https://issues.apache.org/jira/browse/SPARK-9820
>             Project: Spark
>          Issue Type: New Feature
>          Components: PySpark
>            Reporter: Kevin Cox
>              Labels: nullpointerexception
>         Attachments: Frequency.png
>
>
> After the job moves from YARN ACCEPTED to RUNNING it immitetly raises the 
> following exception.
> {code}
> 15/08/11 06:37:01 ERROR AkkaRpcEnv: Ignore error: null
> java.lang.NullPointerException
>       at 
> org.apache.spark.rpc.akka.AkkaRpcEndpointRef.actorRef$lzycompute(AkkaRpcEnv.scala:281)
>       at 
> org.apache.spark.rpc.akka.AkkaRpcEndpointRef.actorRef(AkkaRpcEnv.scala:281)
>       at 
> org.apache.spark.rpc.akka.AkkaRpcEndpointRef.toString(AkkaRpcEnv.scala:322)
>       at java.lang.String.valueOf(String.java:2849)
>       at java.lang.StringBuilder.append(StringBuilder.java:128)
>       at scala.StringContext.standardInterpolator(StringContext.scala:122)
>       at scala.StringContext.s(StringContext.scala:90)
>       at 
> org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnSchedulerEndpoint$$anonfun$receive$1$$anonfun$applyOrElse$5.apply(YarnSchedulerBackend.scala:106)
>       at 
> org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnSchedulerEndpoint$$anonfun$receive$1$$anonfun$applyOrElse$5.apply(YarnSchedulerBackend.scala:106)
>       at org.apache.spark.Logging$class.logInfo(Logging.scala:59)
>       at 
> org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnSchedulerEndpoint.logInfo(YarnSchedulerBackend.scala:96)
>       at 
> org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnSchedulerEndpoint$$anonfun$receive$1.applyOrElse(YarnSchedulerBackend.scala:106)
>       at 
> org.apache.spark.rpc.akka.AkkaRpcEnv.org$apache$spark$rpc$akka$AkkaRpcEnv$$processMessage(AkkaRpcEnv.scala:177)
>       at 
> org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1$$anonfun$receiveWithLogging$1$$anonfun$applyOrElse$4.apply$mcV$sp(AkkaRpcEnv.scala:126)
>       at 
> org.apache.spark.rpc.akka.AkkaRpcEnv.org$apache$spark$rpc$akka$AkkaRpcEnv$$safelyCall(AkkaRpcEnv.scala:197)
>       at 
> org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1$$anonfun$receiveWithLogging$1.applyOrElse(AkkaRpcEnv.scala:125)
>       at 
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
>       at 
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
>       at 
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
>       at 
> org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:59)
>       at 
> org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:42)
>       at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
>       at 
> org.apache.spark.util.ActorLogReceive$$anon$1.applyOrElse(ActorLogReceive.scala:42)
>       at akka.actor.Actor$class.aroundReceive(Actor.scala:467)
>       at 
> org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1.aroundReceive(AkkaRpcEnv.scala:92)
>       at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>       at akka.actor.ActorCell.invoke(ActorCell.scala:487)
>       at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
>       at akka.dispatch.Mailbox.run(Mailbox.scala:220)
>       at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
>       at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>       at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>       at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>       at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> {code}
> Then later it can't request executors.
> {code}
> 15/08/11 06:37:07 INFO YarnScheduler: Adding task set 0.0 with 36 tasks
> 15/08/11 06:37:08 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted 
> to request executors before the AM has registered!
> 15/08/11 06:37:08 WARN ExecutorAllocationManager: Unable to reach the cluster 
> manager to request 1 total executors!
> 15/08/11 06:37:09 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted 
> to request executors before the AM has registered!
> 15/08/11 06:37:09 WARN ExecutorAllocationManager: Unable to reach the cluster 
> manager to request 2 total executors!
> 15/08/11 06:37:10 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted 
> to request executors before the AM has registered!
> 15/08/11 06:37:10 WARN ExecutorAllocationManager: Unable to reach the cluster 
> manager to request 3 total executors!
> 15/08/11 06:37:11 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted 
> to request executors before the AM has registered!
> 15/08/11 06:37:11 WARN ExecutorAllocationManager: Unable to reach the cluster 
> manager to request 4 total executors!
> 15/08/11 06:37:12 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted 
> to request executors before the AM has registered!
> 15/08/11 06:37:12 WARN ExecutorAllocationManager: Unable to reach the cluster 
> manager to request 5 total executors!
> 15/08/11 06:37:13 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted 
> to request executors before the AM has registered!
> 15/08/11 06:37:13 WARN ExecutorAllocationManager: Unable to reach the cluster 
> manager to request 6 total executors!
> {code}
> Which causes the job to hang forever.
> {code}
> WARN YarnScheduler: Initial job has not accepted any resources; check your 
> cluster UI to ensure that workers are registered and have sufficient resources
> WARN YarnScheduler: Initial job has not accepted any resources; check your 
> cluster UI to ensure that workers are registered and have sufficient resources
> WARN YarnScheduler: Initial job has not accepted any resources; check your 
> cluster UI to ensure that workers are registered and have sufficient resources
> WARN YarnScheduler: Initial job has not accepted any resources; check your 
> cluster UI to ensure that workers are registered and have sufficient resources
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to