[jira] [Resolved] (SPARK-604) reconnect if mesos slaves dies

Sean Owen (JIRA) Fri, 15 May 2015 06:51:32 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sean Owen resolved SPARK-604.
-----------------------------
    Resolution: Cannot Reproduce

Stale at this point, without similar findings recently.

> reconnect if mesos slaves dies
> ------------------------------
>
>                 Key: SPARK-604
>                 URL: https://issues.apache.org/jira/browse/SPARK-604
>             Project: Spark
>          Issue Type: Bug
>          Components: Mesos
>
> when running on mesos, if a slave goes down, spark doesn't try to reassign 
> the work to another machine.  Even if the slave comes back up, the job is 
> doomed.
> Currently when this happens, we just see this in the driver logs:
> 12/11/01 16:48:56 INFO mesos.MesosSchedulerBackend: Mesos slave lost: 
> 201210312057-1560611338-5050-24091-52
> Exception in thread "Thread-346" java.util.NoSuchElementException: key not 
> found: value: "201210312057-1560611338-5050-24091-52"
>     at scala.collection.MapLike$class.default(MapLike.scala:224)
>     at scala.collection.mutable.HashMap.default(HashMap.scala:43)
>     at scala.collection.MapLike$class.apply(MapLike.scala:135)
>     at scala.collection.mutable.HashMap.apply(HashMap.scala:43)
>     at 
> spark.scheduler.cluster.ClusterScheduler.slaveLost(ClusterScheduler.scala:255)
>     at 
> spark.scheduler.mesos.MesosSchedulerBackend.slaveLost(MesosSchedulerBackend.scala:275)
> 12/11/01 16:48:56 INFO mesos.MesosSchedulerBackend: driver.run() returned 
> with code DRIVER_ABORTED



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Resolved] (SPARK-604) reconnect if mesos slaves dies

Reply via email to