[ https://issues.apache.org/jira/browse/MESOS-7872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alexander Rukletsov reassigned MESOS-7872: ------------------------------------------ Assignee: Alexander Rukletsov > Scheduler hang when registration fails (due to bad role) > -------------------------------------------------------- > > Key: MESOS-7872 > URL: https://issues.apache.org/jira/browse/MESOS-7872 > Project: Mesos > Issue Type: Bug > Components: scheduler driver > Affects Versions: 1.4.0 > Reporter: Till Toenshoff > Assignee: Alexander Rukletsov > Labels: framework, reliability, scheduler > > I'm finding that if framework registration fails, the mesos driver client > will hang indefinitely with the following output: > {noformat} > I0809 20:04:22.479391 73 sched.cpp:1187] Got error ''FrameworkInfo.role' > is not a valid role: Role '/test/role/slashes' cannot start with a slash' > I0809 20:04:22.479658 73 sched.cpp:2055] Asked to abort the driver > I0809 20:04:22.479843 73 sched.cpp:1233] Aborting framework > {noformat} > I'd have expected one or both of the following: > - SchedulerDriver.run() should have exited with a failed Proto.Status of some > form > - Scheduler.error() should have been invoked when the "Got error" occurred > Steps to reproduce: > - Launch a scheduler instance, have it register with a known-bad framework > info. In this case a role containing slashes was used > - Observe that the scheduler continues in a TASK_RUNNING state despite the > failed registration. From all appearances it looks like the Scheduler > implementation isn't invoked at all > I'd guess that because this failure happens before framework registration, > there's some error handling that isn't fully initialized at this point. -- This message was sent by Atlassian JIRA (v6.4.14#64029)