[ 
https://issues.apache.org/jira/browse/SPARK-4006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14176774#comment-14176774
 ] 

Tal Sliwowicz commented on SPARK-4006:
--------------------------------------

Fixed in - https://github.com/apache/spark/pull/2854

> Spark Driver crashes whenever an Executor is registered twice
> -------------------------------------------------------------
>
>                 Key: SPARK-4006
>                 URL: https://issues.apache.org/jira/browse/SPARK-4006
>             Project: Spark
>          Issue Type: Bug
>          Components: Block Manager, Spark Core
>    Affects Versions: 0.9.2, 1.0.2, 1.1.0
>         Environment: Mesos, Coarse Grained
>            Reporter: Tal Sliwowicz
>            Priority: Critical
>
> This is a huge robustness issue for us (Taboola), in mission critical , time 
> sensitive (real time) spark jobs.
> We have long running spark drivers and even though we have state of the art 
> hardware, from time to time executors disconnect. In many cases, the 
> RemoveExecutor is not received, and when the new executor registers, the 
> driver crashes. In mesos coarse grained, executor ids are fixed. 
> The issue is with the System.exit(1) in BlockManagerMasterActor
> private def register(id: BlockManagerId, maxMemSize: Long, slaveActor: 
> ActorRef) {
>     if (!blockManagerInfo.contains(id)) {
>       blockManagerIdByExecutor.get(id.executorId) match {
>         case Some(manager) =>
>           // A block manager of the same executor already exists.
>           // This should never happen. Let's just quit.
>           logError("Got two different block manager registrations on " + 
> id.executorId)
>           System.exit(1)
>         case None =>
>           blockManagerIdByExecutor(id.executorId) = id
>       }
>       logInfo("Registering block manager %s with %s RAM".format(
>         id.hostPort, Utils.bytesToString(maxMemSize)))
>       blockManagerInfo(id) =
>         new BlockManagerInfo(id, System.currentTimeMillis(), maxMemSize, 
> slaveActor)
>     }
>     listenerBus.post(SparkListenerBlockManagerAdded(id, maxMemSize))
>   }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to