Till Toenshoff created MESOS-7872:

             Summary: Scheduler hang when registration fails (due to bad role)
                 Key: MESOS-7872
                 URL: https://issues.apache.org/jira/browse/MESOS-7872
             Project: Mesos
          Issue Type: Bug
            Reporter: Till Toenshoff

I'm finding that if framework registration fails, the mesos driver client will 
hang indefinitely with the following output:
I0809 20:04:22.479391    73 sched.cpp:1187] Got error ''FrameworkInfo.role' is 
not a valid role: Role '/test/role/slashes' cannot start with a slash'
I0809 20:04:22.479658    73 sched.cpp:2055] Asked to abort the driver
I0809 20:04:22.479843    73 sched.cpp:1233] Aborting framework 

I'd have expected one or both of the following:
- SchedulerDriver.run() should have exited with a failed Proto.Status of some 
- Scheduler.error() should have been invoked when the "Got error" occurred

Steps to reproduce:
- Launch a scheduler instance, have it register with a known-bad framework 
info. In this case a role containing slashes was used
- Observe that the scheduler continues in a TASK_RUNNING state despite the 
failed registration. From all appearances it looks like the Scheduler 
implementation isn't invoked at all

I'd guess that because this failure happens before framework registration, 
there's some error handling that isn't fully initialized at this point.

This message was sent by Atlassian JIRA

Reply via email to