Lei Xu created MESOS-4299:
-----------------------------
Summary: Slave lives in two different cluster at the same time
with different slave id
Key: MESOS-4299
URL: https://issues.apache.org/jira/browse/MESOS-4299
Project: Mesos
Issue Type: Bug
Components: master, webui
Affects Versions: 0.25.0
Environment: Mesos 0.25.0
Reporter: Lei Xu
I've migrated some nodes from Cluster A to B, and today I found these nodes
lives both in Cluster A and B, and the here is the {{/master/slaves}} response:
{code}
{
"slaves": [
{
"active": false,
"attributes": {
"apps": "logstash",
"colo": "cn5",
"type": "prod"
},
"hostname": "l-bu128g5-10k10.ops.cn2.qunar.com",
"id": "3e7ba6b1-29fd-44e8-9be2-f72896054ac6-S2",
"offered_resources": {
"cpus": 0,
"disk": 0,
"mem": 0
},
"pid": "slave(1)@10.90.5.19:5051",
"registered_time": 1451988622.66323,
"reserved_resources": {},
"resources": {
"cpus": 32.0,
"disk": 2728919.0,
"mem": 128126.0,
"ports": "[8100-10000, 31000-32000]"
},
"unreserved_resources": {
"cpus": 32.0,
"disk": 2728919.0,
"mem": 128126.0,
"ports": "[8100-10000, 31000-32000]"
},
"used_resources": {
"cpus": 0,
"disk": 0,
"mem": 0
}
},
.....
{code}
And the following is mesos slave logs:
{quote}
I0105 18:36:22.683724 6452 slave.cpp:2248] Updated checkpointed resources from
to
I0105 18:37:09.900497 6459 slave.cpp:3926] Current disk usage 0.06%. Max
allowed age: 1.798706758587755days
I0105 18:37:22.678374 6453 slave.cpp:3146] Master marked the slave as
disconnected but the slave considers itself registered! Forcing re-registration.
I0105 18:37:22.678699 6453 slave.cpp:694] Re-detecting master
I0105 18:37:22.678715 6471 status_update_manager.cpp:176] Pausing sending
status updates
I0105 18:37:22.678753 6453 slave.cpp:741] Detecting new master
I0105 18:37:22.678977 6456 status_update_manager.cpp:176] Pausing sending
status updates
I0105 18:37:22.679047 6455 slave.cpp:705] New master detected at
[email protected]:5050
I0105 18:37:22.679108 6455 slave.cpp:768] Authenticating with master
[email protected]:5050
I0105 18:37:22.679136 6455 slave.cpp:773] Using default CRAM-MD5 authenticatee
I0105 18:37:22.679239 6455 slave.cpp:741] Detecting new master
I0105 18:37:22.679354 6464 authenticatee.cpp:115] Creating new client SASL
connection
I0105 18:37:22.680883 6461 authenticatee.cpp:206] Received SASL authentication
mechanisms: CRAM-MD5
I0105 18:37:22.680946 6461 authenticatee.cpp:232] Attempting to authenticate
with mechanism 'CRAM-MD5'
I0105 18:37:22.681759 6455 authenticatee.cpp:252] Received SASL authentication
step
I0105 18:37:22.682874 6454 authenticatee.cpp:292] Authentication success
I0105 18:37:22.682986 6441 slave.cpp:836] Successfully authenticated with
master [email protected]:5050
I0105 18:37:22.684303 6454 slave.cpp:980] Re-registered with master
[email protected]:5050
I0105 18:37:22.684455 6454 slave.cpp:1016] Forwarding total oversubscribed
resources
I0105 18:37:22.684471 6468 status_update_manager.cpp:183] Resuming sending
status updates
I0105 18:37:22.684649 6454 slave.cpp:2152] Updating framework
20150610-204949-3299432458-5050-25057-0000 pid to
[email protected]:35708
I0105 18:37:22.685025 6452 status_update_manager.cpp:183] Resuming sending
status updates
I0105 18:37:22.685117 6454 slave.cpp:2248] Updated checkpointed resources from
to
I0105 18:38:09.901587 6464 slave.cpp:3926] Current disk usage 0.06%. Max
allowed age: 1.798706755730266days
I0105 18:38:22.679468 6451 slave.cpp:3146] Master marked the slave as
disconnected but the slave considers itself registered! Forcing re-registration.
I0105 18:38:22.679739 6451 slave.cpp:694] Re-detecting master
I0105 18:38:22.679754 6453 status_update_manager.cpp:176] Pausing sending
status updates
I0105 18:38:22.679785 6451 slave.cpp:741] Detecting new master
I0105 18:38:22.680054 6461 slave.cpp:705] New master detected at
[email protected]:5050
I0105 18:38:22.680106 6470 status_update_manager.cpp:176] Pausing sending
status updates
I0105 18:38:22.680107 6461 slave.cpp:768] Authenticating with master
[email protected]:5050
I0105 18:38:22.680197 6461 slave.cpp:773] Using default CRAM-MD5 authenticatee
I0105 18:38:22.680271 6461 slave.cpp:741] Detecting new master
.................
W0105 19:05:38.207882 6450 slave.cpp:1973] Ignoring shutdown framework message
for 3e7ba6b1-29fd-44e8-9be2-f72896054ac6-0116 from [email protected]:5050
because it is not from the registered master ([email protected]:5050)
W0106 09:12:38.666767 6468 slave.cpp:1973] Ignoring shutdown framework message
for 3e7ba6b1-29fd-44e8-9be2-f72896054ac6-0002 from [email protected]:5050
because it is not from the registered master ([email protected]:5050)
W0106 12:13:35.782218 6441 slave.cpp:1973] Ignoring shutdown framework message
for 3e7ba6b1-29fd-44e8-9be2-f72896054ac6-0117 from [email protected]:5050
because it is not from the registered master ([email protected]:5050)
W0106 12:23:22.348956 6444 slave.cpp:1973] Ignoring shutdown framework message
for 3e7ba6b1-29fd-44e8-9be2-f72896054ac6-0118 from [email protected]:5050
because it is not from the registered master ([email protected]:5050)
W0106 12:35:36.660111 6443 slave.cpp:1973] Ignoring shutdown framework message
for 3e7ba6b1-29fd-44e8-9be2-f72896054ac6-0119 from [email protected]:5050
because it is not from the registered master ([email protected]:5050)
W0106 12:40:43.735994 6461 slave.cpp:1973] Ignoring shutdown framework message
for 3e7ba6b1-29fd-44e8-9be2-f72896054ac6-0121 from [email protected]:5050
because it is not from the registered master ([email protected]:5050)
W0106 12:42:09.539126 6456 slave.cpp:1973] Ignoring shutdown framework message
for 3e7ba6b1-29fd-44e8-9be2-f72896054ac6-0120 from [email protected]:5050
because it is not from the registered master ([email protected]:5050)
W0106 12:52:40.787961 6465 slave.cpp:1973] Ignoring shutdown framework message
for 3e7ba6b1-29fd-44e8-9be2-f72896054ac6-0122 from [email protected]:5050
because it is not from the registered master ([email protected]:5050)
W0106 12:58:10.425287 6461 slave.cpp:1973] Ignoring shutdown framework message
for 3e7ba6b1-29fd-44e8-9be2-f72896054ac6-0123 from [email protected]:5050
because it is not from the registered master ([email protected]:5050)
W0106 13:03:32.236495 6456 slave.cpp:1973] Ignoring shutdown framework message
for 3e7ba6b1-29fd-44e8-9be2-f72896054ac6-0125 from [email protected]:5050
because it is not from the registered master ([email protected]:5050)
W0106 13:10:58.501510 6472 slave.cpp:1973] Ignoring shutdown framework message
for 3e7ba6b1-29fd-44e8-9be2-f72896054ac6-0126 from [email protected]:5050
because it is not from the registered master ([email protected]:5050)
W0106 13:16:04.233232 6460 slave.cpp:1973] Ignoring shutdown framework message
for 3e7ba6b1-29fd-44e8-9be2-f72896054ac6-0127 from [email protected]:5050
because it is not from the registered master ([email protected]:5050)
W0106 14:17:24.198786 6472 slave.cpp:1973] Ignoring shutdown framework message
for 3e7ba6b1-29fd-44e8-9be2-f72896054ac6-0115 from [email protected]:5050
because it is not from the registered master ([email protected]:5050)
W0106 14:18:57.036814 6464 slave.cpp:1973] Ignoring shutdown framework message
for 3e7ba6b1-29fd-44e8-9be2-f72896054ac6-0005 from [email protected]:5050
because it is not from the registered master ([email protected]:5050)
W0106 14:36:19.755764 6460 slave.cpp:1973] Ignoring shutdown framework message
for 3e7ba6b1-29fd-44e8-9be2-f72896054ac6-0112 from [email protected]:5050
because it is not from the registered master ([email protected]:5050)
W0106 14:46:54.420217 6462 slave.cpp:1973] Ignoring shutdown framework message
for 3e7ba6b1-29fd-44e8-9be2-f72896054ac6-0129 from [email protected]:5050
because it is not from the registered master ([email protected]:5050)
{quote}
It looks like that slave nodes has some metadata from Cluster A, but still
accept to registery with Cluster B.
Should we do some validation before join the new cluster if we do not clear up
the node ?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)