Lei Xu created MESOS-4299:
-----------------------------

             Summary: Slave lives in two different cluster at the same time 
with different slave id
                 Key: MESOS-4299
                 URL: https://issues.apache.org/jira/browse/MESOS-4299
             Project: Mesos
          Issue Type: Bug
          Components: master, webui
    Affects Versions: 0.25.0
         Environment: Mesos 0.25.0
            Reporter: Lei Xu


I've migrated some nodes from Cluster A to B, and today I found these nodes 
lives both in Cluster A and B, and the here is the {{/master/slaves}} response:

{code}
{
  "slaves": [
    {
      "active": false,
      "attributes": {
        "apps": "logstash",
        "colo": "cn5",
        "type": "prod"
      },
      "hostname": "l-bu128g5-10k10.ops.cn2.qunar.com",
      "id": "3e7ba6b1-29fd-44e8-9be2-f72896054ac6-S2",
      "offered_resources": {
        "cpus": 0,
        "disk": 0,
        "mem": 0
      },
      "pid": "slave(1)@10.90.5.19:5051",
      "registered_time": 1451988622.66323,
      "reserved_resources": {},
      "resources": {
        "cpus": 32.0,
        "disk": 2728919.0,
        "mem": 128126.0,
        "ports": "[8100-10000, 31000-32000]"
      },
      "unreserved_resources": {
        "cpus": 32.0,
        "disk": 2728919.0,
        "mem": 128126.0,
        "ports": "[8100-10000, 31000-32000]"
      },
      "used_resources": {
        "cpus": 0,
        "disk": 0,
        "mem": 0
      }
    },
    .....
{code}

And the following is mesos slave logs:

{quote}
I0105 18:36:22.683724  6452 slave.cpp:2248] Updated checkpointed resources from 
 to
I0105 18:37:09.900497  6459 slave.cpp:3926] Current disk usage 0.06%. Max 
allowed age: 1.798706758587755days
I0105 18:37:22.678374  6453 slave.cpp:3146] Master marked the slave as 
disconnected but the slave considers itself registered! Forcing re-registration.
I0105 18:37:22.678699  6453 slave.cpp:694] Re-detecting master
I0105 18:37:22.678715  6471 status_update_manager.cpp:176] Pausing sending 
status updates
I0105 18:37:22.678753  6453 slave.cpp:741] Detecting new master
I0105 18:37:22.678977  6456 status_update_manager.cpp:176] Pausing sending 
status updates
I0105 18:37:22.679047  6455 slave.cpp:705] New master detected at 
[email protected]:5050
I0105 18:37:22.679108  6455 slave.cpp:768] Authenticating with master 
[email protected]:5050
I0105 18:37:22.679136  6455 slave.cpp:773] Using default CRAM-MD5 authenticatee
I0105 18:37:22.679239  6455 slave.cpp:741] Detecting new master
I0105 18:37:22.679354  6464 authenticatee.cpp:115] Creating new client SASL 
connection
I0105 18:37:22.680883  6461 authenticatee.cpp:206] Received SASL authentication 
mechanisms: CRAM-MD5
I0105 18:37:22.680946  6461 authenticatee.cpp:232] Attempting to authenticate 
with mechanism 'CRAM-MD5'
I0105 18:37:22.681759  6455 authenticatee.cpp:252] Received SASL authentication 
step
I0105 18:37:22.682874  6454 authenticatee.cpp:292] Authentication success
I0105 18:37:22.682986  6441 slave.cpp:836] Successfully authenticated with 
master [email protected]:5050
I0105 18:37:22.684303  6454 slave.cpp:980] Re-registered with master 
[email protected]:5050
I0105 18:37:22.684455  6454 slave.cpp:1016] Forwarding total oversubscribed 
resources
I0105 18:37:22.684471  6468 status_update_manager.cpp:183] Resuming sending 
status updates
I0105 18:37:22.684649  6454 slave.cpp:2152] Updating framework 
20150610-204949-3299432458-5050-25057-0000 pid to 
[email protected]:35708
I0105 18:37:22.685025  6452 status_update_manager.cpp:183] Resuming sending 
status updates
I0105 18:37:22.685117  6454 slave.cpp:2248] Updated checkpointed resources from 
 to
I0105 18:38:09.901587  6464 slave.cpp:3926] Current disk usage 0.06%. Max 
allowed age: 1.798706755730266days
I0105 18:38:22.679468  6451 slave.cpp:3146] Master marked the slave as 
disconnected but the slave considers itself registered! Forcing re-registration.
I0105 18:38:22.679739  6451 slave.cpp:694] Re-detecting master
I0105 18:38:22.679754  6453 status_update_manager.cpp:176] Pausing sending 
status updates
I0105 18:38:22.679785  6451 slave.cpp:741] Detecting new master
I0105 18:38:22.680054  6461 slave.cpp:705] New master detected at 
[email protected]:5050
I0105 18:38:22.680106  6470 status_update_manager.cpp:176] Pausing sending 
status updates
I0105 18:38:22.680107  6461 slave.cpp:768] Authenticating with master 
[email protected]:5050
I0105 18:38:22.680197  6461 slave.cpp:773] Using default CRAM-MD5 authenticatee
I0105 18:38:22.680271  6461 slave.cpp:741] Detecting new master

.................

W0105 19:05:38.207882  6450 slave.cpp:1973] Ignoring shutdown framework message 
for 3e7ba6b1-29fd-44e8-9be2-f72896054ac6-0116 from [email protected]:5050 
because it is not from the registered master ([email protected]:5050)
W0106 09:12:38.666767  6468 slave.cpp:1973] Ignoring shutdown framework message 
for 3e7ba6b1-29fd-44e8-9be2-f72896054ac6-0002 from [email protected]:5050 
because it is not from the registered master ([email protected]:5050)
W0106 12:13:35.782218  6441 slave.cpp:1973] Ignoring shutdown framework message 
for 3e7ba6b1-29fd-44e8-9be2-f72896054ac6-0117 from [email protected]:5050 
because it is not from the registered master ([email protected]:5050)
W0106 12:23:22.348956  6444 slave.cpp:1973] Ignoring shutdown framework message 
for 3e7ba6b1-29fd-44e8-9be2-f72896054ac6-0118 from [email protected]:5050 
because it is not from the registered master ([email protected]:5050)
W0106 12:35:36.660111  6443 slave.cpp:1973] Ignoring shutdown framework message 
for 3e7ba6b1-29fd-44e8-9be2-f72896054ac6-0119 from [email protected]:5050 
because it is not from the registered master ([email protected]:5050)
W0106 12:40:43.735994  6461 slave.cpp:1973] Ignoring shutdown framework message 
for 3e7ba6b1-29fd-44e8-9be2-f72896054ac6-0121 from [email protected]:5050 
because it is not from the registered master ([email protected]:5050)
W0106 12:42:09.539126  6456 slave.cpp:1973] Ignoring shutdown framework message 
for 3e7ba6b1-29fd-44e8-9be2-f72896054ac6-0120 from [email protected]:5050 
because it is not from the registered master ([email protected]:5050)
W0106 12:52:40.787961  6465 slave.cpp:1973] Ignoring shutdown framework message 
for 3e7ba6b1-29fd-44e8-9be2-f72896054ac6-0122 from [email protected]:5050 
because it is not from the registered master ([email protected]:5050)
W0106 12:58:10.425287  6461 slave.cpp:1973] Ignoring shutdown framework message 
for 3e7ba6b1-29fd-44e8-9be2-f72896054ac6-0123 from [email protected]:5050 
because it is not from the registered master ([email protected]:5050)
W0106 13:03:32.236495  6456 slave.cpp:1973] Ignoring shutdown framework message 
for 3e7ba6b1-29fd-44e8-9be2-f72896054ac6-0125 from [email protected]:5050 
because it is not from the registered master ([email protected]:5050)
W0106 13:10:58.501510  6472 slave.cpp:1973] Ignoring shutdown framework message 
for 3e7ba6b1-29fd-44e8-9be2-f72896054ac6-0126 from [email protected]:5050 
because it is not from the registered master ([email protected]:5050)
W0106 13:16:04.233232  6460 slave.cpp:1973] Ignoring shutdown framework message 
for 3e7ba6b1-29fd-44e8-9be2-f72896054ac6-0127 from [email protected]:5050 
because it is not from the registered master ([email protected]:5050)
W0106 14:17:24.198786  6472 slave.cpp:1973] Ignoring shutdown framework message 
for 3e7ba6b1-29fd-44e8-9be2-f72896054ac6-0115 from [email protected]:5050 
because it is not from the registered master ([email protected]:5050)
W0106 14:18:57.036814  6464 slave.cpp:1973] Ignoring shutdown framework message 
for 3e7ba6b1-29fd-44e8-9be2-f72896054ac6-0005 from [email protected]:5050 
because it is not from the registered master ([email protected]:5050)
W0106 14:36:19.755764  6460 slave.cpp:1973] Ignoring shutdown framework message 
for 3e7ba6b1-29fd-44e8-9be2-f72896054ac6-0112 from [email protected]:5050 
because it is not from the registered master ([email protected]:5050)
W0106 14:46:54.420217  6462 slave.cpp:1973] Ignoring shutdown framework message 
for 3e7ba6b1-29fd-44e8-9be2-f72896054ac6-0129 from [email protected]:5050 
because it is not from the registered master ([email protected]:5050)


{quote}

It looks like that slave nodes has some metadata from Cluster A, but still 
accept to registery with Cluster B.

Should we do some validation before join the new cluster if we do not clear up 
the node ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to