[ https://issues.apache.org/jira/browse/MESOS-6249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15549924#comment-15549924 ]
Benjamin Mahler edited comment on MESOS-6249 at 10/5/16 8:58 PM: ----------------------------------------------------------------- Linking in MESOS-786 which describes the lifecycle of registered and re-registered callbacks. Note that MESOS-786 was resolved but AFAICT we did not update to the newer semantics described in this ticket for schedulers that use the old-style driver. However, it sounds like you care about this because you're trying to detect that the master has failed over. To do this you must introspect the {{MasterInfo}} provided to you in order to see if {{MasterInfo.id}} has changed. was (Author: bmahler): Linking in MESOS-786 which describes the lifecycle of registered and re-registered callbacks. Note that MESOS-786 was resolved but AFAICT we did not update to the newer semantics described in this ticket for schedulers that use the old-style driver. However, it sounds like you care about this because you're to detect that the master has failed over. To do this you must introspect the {{MasterInfo}} provided to you in order to see if {{MasterInfo.id}} has changed. > On Mesos master failover the reregistered callback is not triggered > ------------------------------------------------------------------- > > Key: MESOS-6249 > URL: https://issues.apache.org/jira/browse/MESOS-6249 > Project: Mesos > Issue Type: Bug > Components: java api > Affects Versions: 0.28.0, 0.28.1, 1.0.1 > Environment: OS X 10.11.6 > Reporter: Markus Jura > > On a Mesos master failover the reregistered callback of the Java API is not > triggered. Only the registration callback is triggered which makes it hard > for a framework to distinguish between these scenarios. > This behaviour has been tested with the ConductR framework, both with the > Java API version 0.28.0, 0.28.1 and 1.0.1. Below you find the logs from the > master that got re-elected and from the ConductR framework. > *Log: Mesos master on a master re-election* > {code:bash} > I0926 11:44:20.008306 3747840 zookeeper.cpp:259] A new leading master > (UPID=master@127.0.0.1:5050) is detected > I0926 11:44:20.008458 3747840 master.cpp:1847] The newly elected leader is > master@127.0.0.1:5050 with id ca5b9713-1eec-43e1-9d27-9ebc5c0f95b1 > I0926 11:44:20.008484 3747840 master.cpp:1860] Elected as the leading master! > I0926 11:44:20.008498 3747840 master.cpp:1547] Recovering from registrar > I0926 11:44:20.008607 3747840 registrar.cpp:332] Recovering registrar > I0926 11:44:20.016340 4284416 registrar.cpp:365] Successfully fetched the > registry (0B) in 7.702016ms > I0926 11:44:20.016393 4284416 registrar.cpp:464] Applied 1 operations in > 12us; attempting to update the 'registry' > I0926 11:44:20.021428 4284416 registrar.cpp:509] Successfully updated the > 'registry' in 5.019904ms > I0926 11:44:20.021481 4284416 registrar.cpp:395] Successfully recovered > registrar > I0926 11:44:20.021611 528384 master.cpp:1655] Recovered 0 agents from the > Registry (118B) ; allowing 10mins for agents to re-register > I0926 11:44:20.536859 3747840 master.cpp:2424] Received SUBSCRIBE call for > framework 'conductr' at > scheduler-3f8b9645-7a17-4e9f-8ad5-077fe8c23b39@192.168.2.106:57164 > I0926 11:44:20.536969 3747840 master.cpp:2500] Subscribing framework conductr > with checkpointing disabled and capabilities [ ] > I0926 11:44:20.537401 3211264 hierarchical.cpp:271] Added framework conductr > I0926 11:44:20.807895 528384 master.cpp:4787] Re-registering agent > b99256c3-6905-44d3-bcc9-0d9e00d20fbe-S0 at slave(1)@127.0.0.1:5051 (127.0.0.1) > I0926 11:44:20.808145 1601536 registrar.cpp:464] Applied 1 operations in > 38us; attempting to update the 'registry' > I0926 11:44:20.815757 1601536 registrar.cpp:509] Successfully updated the > 'registry' in 7.568896ms > I0926 11:44:20.815992 3747840 master.cpp:7447] Adding task > 6abce9bb-895f-4f6f-be5b-25f6bd09f548 with resources mem(*):0 on agent > b99256c3-6905-44d3-bcc9-0d9e00d20fbe-S0 (127.0.0.1) > I0926 11:44:20.816339 3747840 master.cpp:4872] Re-registered agent > b99256c3-6905-44d3-bcc9-0d9e00d20fbe-S0 at slave(1)@127.0.0.1:5051 > (127.0.0.1) with cpus(*):8; mem(*):15360; disk(*):470832; > ports(*):[31000-32000] > I0926 11:44:20.816385 1601536 hierarchical.cpp:478] Added agent > b99256c3-6905-44d3-bcc9-0d9e00d20fbe-S0 (127.0.0.1) with cpus(*):8; > mem(*):15360; disk(*):470832; ports(*):[31000-32000] (allocated: cpus(*):0.9; > mem(*):402.653; disk(*):1000; ports(*):[31000-31000, 31001-31500]) > I0926 11:44:20.816437 3747840 master.cpp:4940] Sending updated checkpointed > resources to agent b99256c3-6905-44d3-bcc9-0d9e00d20fbe-S0 at > slave(1)@127.0.0.1:5051 (127.0.0.1) > I0926 11:44:20.816787 4284416 master.cpp:5725] Sending 1 offers to framework > conductr (conductr) at > scheduler-3f8b9645-7a17-4e9f-8ad5-077fe8c23b39@192.168.2.106:57164 > {code} > *Log: ConductR framework* > {code:bash} > I0926 11:44:20.007189 66441216 detector.cpp:152] Detected a new leader: > (id='87') > I0926 11:44:20.007524 64294912 group.cpp:706] Trying to get > '/mesos/json.info_0000000087' in ZooKeeper > I0926 11:44:20.008625 63758336 zookeeper.cpp:259] A new leading master > (UPID=master@127.0.0.1:5050) is detected > I0926 11:44:20.008965 63758336 sched.cpp:330] New master detected at > master@127.0.0.1:5050 > 2016-09-26T09:44:20Z MacBook-Pro-6.local INFO MesosSchedulerClient > [sourceThread=conductr-akka.actor.default-dispatcher-2, > akkaTimestamp=09:44:20.009UTC, > akkaSource=akka.tcp://conductr@127.0.0.1:9004/user/reaper/mesos-client-supervisor/singleton/mesos-client, > sourceActorSystem=conductr] - Mesos master has been disconnected.. > I0926 11:44:20.012472 63758336 sched.cpp:341] No credentials provided. > Attempting to register without authentication > I0926 11:44:20.537613 65904640 sched.cpp:743] Framework registered with > conductr > 2016-09-26T09:44:20Z MacBook-Pro-6.local INFO MesosSchedulerClient > [sourceThread=conductr-akka.actor.default-dispatcher-18, > akkaTimestamp=09:44:20.538UTC, > akkaSource=akka.tcp://conductr@127.0.0.1:9004/user/reaper/mesos-client-supervisor/singleton/mesos-client, > sourceActorSystem=conductr] - Mesos master on localhost:5050 has been > registered with ConductR framework id: conductr > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)