SuYan created SPARK-13112:
-----------------------------

             Summary: CoarsedExecutorBackend register to driver should wait 
Executor was ready
                 Key: SPARK-13112
                 URL: https://issues.apache.org/jira/browse/SPARK-13112
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 1.6.0
            Reporter: SuYan


desc: 
due to some host's disk are busy, it will results failed in timeoutException 
while executor try to register to shuffler server on that host... 
and then it will exit(1) while launch task on a null executor.

and yarn cluster resource are a little busy, yarn will thought that host is 
idle, it will prefer to allocate the same host executor, so it will have a 
chance that one task failed 4 times in the same host. 

currently, CoarsedExecutorBackend register to driver first, and after 
registerDriver successful, then initial Executor. 
if exception occurs in Executor initialization,
But Driver don't know that event, will still launch task in that executor,
then will call system.exit(1). 
{code}
 override def receive: PartialFunction[Any, Unit] = { 
  case RegisteredExecutor(hostname) => 
  logInfo("Successfully registered with driver") executor = new 
Executor(executorId, hostname, env, userClassPath, isLocal = false) 
......
case LaunchTask(data) =>
   if (executor == null) {
    logError("Received LaunchTask command but executor was null")        
System.exit(1) 
{code}

 It is more reasonable to register with driver after Executor is ready... and 
make registerTimeout to be configurable...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to