[ https://issues.apache.org/jira/browse/MESOS-7872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kapil Arya updated MESOS-7872: ------------------------------ Fix Version/s: (was: 1.4.1) (was: 1.5.0) 1.4.0 > Scheduler hang when registration fails. > --------------------------------------- > > Key: MESOS-7872 > URL: https://issues.apache.org/jira/browse/MESOS-7872 > Project: Mesos > Issue Type: Bug > Components: scheduler driver > Affects Versions: 1.4.0 > Reporter: Till Toenshoff > Assignee: Alexander Rukletsov > Labels: framework, reliability, scheduler > Fix For: 1.2.3, 1.3.2, 1.4.0 > > > I'm finding that if framework registration fails, the mesos driver client > will hang indefinitely with the following output: > {noformat} > I0809 20:04:22.479391 73 sched.cpp:1187] Got error ''FrameworkInfo.role' > is not a valid role: Role '/test/role/slashes' cannot start with a slash' > I0809 20:04:22.479658 73 sched.cpp:2055] Asked to abort the driver > I0809 20:04:22.479843 73 sched.cpp:1233] Aborting framework > {noformat} > I'd have expected one or both of the following: > - SchedulerDriver.run() should have exited with a failed Proto.Status of some > form > - Scheduler.error() should have been invoked when the "Got error" occurred > Steps to reproduce: > - Launch a scheduler instance, have it register with a known-bad framework > info. In this case a role containing slashes was used > - Observe that the scheduler continues in a TASK_RUNNING state despite the > failed registration. From all appearances it looks like the Scheduler > implementation isn't invoked at all > I'd guess that because this failure happens before framework registration, > there's some error handling that isn't fully initialized at this point. -- This message was sent by Atlassian JIRA (v6.4.14#64029)