[ 
https://issues.apache.org/jira/browse/MESOS-6421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16966565#comment-16966565
 ] 

suisuinian commented on MESOS-6421:
-----------------------------------

 

hi neil  ,how to resove it ?    i have the same problem  .

I1104 19:02:49.445144 457678 slave.cpp:3146] Master marked the slave as 
disconnected but the slave considers itself registered! Forcing re-registration.
I1104 19:02:49.445335 457678 slave.cpp:694] Re-detecting master
I1104 19:02:49.445340 457644 status_update_manager.cpp:176] Pausing sending 
status updates
I1104 19:02:49.445366 457678 slave.cpp:741] Detecting new master
I1104 19:02:49.445482 457668 status_update_manager.cpp:176] Pausing sending 
status updates  

> Agent enters re-registration loop after "--recovery=cleanup"
> ------------------------------------------------------------
>
>                 Key: MESOS-6421
>                 URL: https://issues.apache.org/jira/browse/MESOS-6421
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Neil Conway
>            Assignee: Neil Conway
>            Priority: Major
>              Labels: mesosphere
>
> Repro:
> 1. Start a master
> 2. Start an agent. The agent should register with the master.
> 3. Ctrl-C agent.
> 4. Start agent with {{--recover=cleanup}}. Agent will cleanup {{work_dir}} 
> and terminate.
> 5. Start agent without any recovery flag. Agent will then proceed to register 
> and obtain a new slave ID. However, it will repeatedly disconnect and 
> re-register with the master:
> {noformat}
> I1019 16:59:21.859954 3747840 slave.cpp:915] New master detected at 
> [email protected]:5050
> I1019 16:59:21.859997 3747840 slave.cpp:936] No credentials provided. 
> Attempting to register without authentication
> I1019 16:59:21.860038 3747840 slave.cpp:947] Detecting new master
> I1019 16:59:22.712779 528384 slave.cpp:1115] Registered with master 
> [email protected]:5050; given agent ID 3615d417-b3fd-4e00-b794-9396456c3f6a-S1
> I1019 16:59:22.712908 2138112 status_update_manager.cpp:184] Resuming sending 
> status updates
> I1019 16:59:22.713953 528384 slave.cpp:1175] Forwarding total oversubscribed 
> resources {}
> I1019 16:59:25.082317 3747840 slave.cpp:4147] Master marked the agent as 
> disconnected but the agent considers itself registered! Forcing 
> re-registration.
> I1019 16:59:25.082756 3747840 slave.cpp:904] Re-detecting master
> I1019 16:59:25.082767 2138112 status_update_manager.cpp:177] Pausing sending 
> status updates
> I1019 16:59:25.082801 3747840 slave.cpp:947] Detecting new master
> I1019 16:59:25.083293 4284416 status_update_manager.cpp:177] Pausing sending 
> status updates
> I1019 16:59:25.083300 528384 slave.cpp:915] New master detected at 
> [email protected]:5050
> I1019 16:59:25.083349 528384 slave.cpp:936] No credentials provided. 
> Attempting to register without authentication
> I1019 16:59:25.083395 528384 slave.cpp:947] Detecting new master
> I1019 16:59:25.869060 1064960 slave.cpp:1217] Re-registered with master 
> [email protected]:5050
> I1019 16:59:25.869246 4284416 status_update_manager.cpp:184] Resuming sending 
> status updates
> I1019 16:59:25.869246 1064960 slave.cpp:1253] Forwarding total oversubscribed 
> resources {}
> I1019 16:59:40.087697 4284416 slave.cpp:4147] Master marked the agent as 
> disconnected but the agent considers itself registered! Forcing 
> re-registration.
> I1019 16:59:40.088105 4284416 slave.cpp:904] Re-detecting master
> I1019 16:59:40.088120 3747840 status_update_manager.cpp:177] Pausing sending 
> status updates
> I1019 16:59:40.088145 4284416 slave.cpp:947] Detecting new master
> I1019 16:59:40.088599 2138112 status_update_manager.cpp:177] Pausing sending 
> status updates
> I1019 16:59:40.088623 1601536 slave.cpp:915] New master detected at 
> [email protected]:5050
> I1019 16:59:40.088656 1601536 slave.cpp:936] No credentials provided. 
> Attempting to register without authentication
> I1019 16:59:40.088716 1601536 slave.cpp:947] Detecting new master
> I1019 16:59:41.006837 2138112 slave.cpp:1217] Re-registered with master 
> [email protected]:5050
> I1019 16:59:41.007017 2674688 status_update_manager.cpp:184] Resuming sending 
> status updates
> I1019 16:59:41.007035 2138112 slave.cpp:1253] Forwarding total oversubscribed 
> resources {}
> I1019 16:59:55.089197 2674688 slave.cpp:4147] Master marked the agent as 
> disconnected but the agent considers itself registered! Forcing 
> re-registration.
> {noformat}
> This continues on, seemingly indefinitely. Master logs during this period:
> {noformat}
> I1019 16:59:04.307692 528384 master.cpp:5579] Received update of agent 
> 3615d417-b3fd-4e00-b794-9396456c3f6a-S0 at slave(1)@10.0.9.176:5051 
> (10.0.9.176) with total oversubscribed resources {}
> I1019 16:59:04.307929 528384 hierarchical.cpp:555] Agent 
> 3615d417-b3fd-4e00-b794-9396456c3f6a-S0 (10.0.9.176) updated with 
> oversubscribed resources {} (total: cpus(*):8; mem(*):15360; disk(*):470832; 
> ports(*):[31000-32000], allocated: {})
> E1019 16:59:06.411664 4820992 process.cpp:2154] Failed to shutdown socket 
> with fd 12: Socket is not connected
> I1019 16:59:06.411717 2138112 master.cpp:1259] Agent 
> 3615d417-b3fd-4e00-b794-9396456c3f6a-S0 at slave(1)@10.0.9.176:5051 
> (10.0.9.176) disconnected
> I1019 16:59:06.411767 2138112 master.cpp:2948] Disconnecting agent 
> 3615d417-b3fd-4e00-b794-9396456c3f6a-S0 at slave(1)@10.0.9.176:5051 
> (10.0.9.176)
> E1019 16:59:06.411854 4820992 process.cpp:2154] Failed to shutdown socket 
> with fd 9: Socket is not connected
> I1019 16:59:06.411887 2138112 master.cpp:2967] Deactivating agent 
> 3615d417-b3fd-4e00-b794-9396456c3f6a-S0 at slave(1)@10.0.9.176:5051 
> (10.0.9.176)
> I1019 16:59:06.412001 2674688 hierarchical.cpp:584] Agent 
> 3615d417-b3fd-4e00-b794-9396456c3f6a-S0 deactivated
> E1019 16:59:10.078735 4820992 process.cpp:2154] Failed to shutdown socket 
> with fd 9: Socket is not connected
> I1019 16:59:22.705441 3211264 master.cpp:5070] Removing old disconnected 
> agent 3615d417-b3fd-4e00-b794-9396456c3f6a-S0 at slave(1)@10.0.9.176:5051 
> (10.0.9.176) because a registration attempt occurred
> I1019 16:59:22.705507 3211264 master.cpp:5111] Registering agent at 
> slave(1)@10.0.9.176:5051 (10.0.9.176) with id 
> 3615d417-b3fd-4e00-b794-9396456c3f6a-S1
> I1019 16:59:22.705821 3747840 registrar.cpp:461] Applied 1 operations in 
> 90us; attempting to update the registry
> I1019 16:59:22.706418 528384 coordinator.cpp:348] Coordinator attempting to 
> write APPEND action at position 5
> I1019 16:59:22.707296 2674688 replica.cpp:537] Replica received write request 
> for position 5 from __req_res__(10)@10.0.9.176:5050
> I1019 16:59:22.710041 2138112 replica.cpp:691] Replica received learned 
> notice for position 5 from @0.0.0.0:0
> I1019 16:59:22.711555 3211264 registrar.cpp:506] Successfully updated the 
> registry in 5.64224ms
> I1019 16:59:22.711786 3747840 coordinator.cpp:348] Coordinator attempting to 
> write TRUNCATE action at position 6
> I1019 16:59:22.712194 1601536 master.cpp:5182] Registered agent 
> 3615d417-b3fd-4e00-b794-9396456c3f6a-S1 at slave(1)@10.0.9.176:5051 
> (10.0.9.176) with cpus(*):8; mem(*):15360; disk(*):470832; 
> ports(*):[31000-32000]
> I1019 16:59:22.712391 2138112 hierarchical.cpp:485] Added agent 
> 3615d417-b3fd-4e00-b794-9396456c3f6a-S1 (10.0.9.176) with cpus(*):8; 
> mem(*):15360; disk(*):470832; ports(*):[31000-32000] (allocated: {})
> I1019 16:59:22.712443 3211264 replica.cpp:537] Replica received write request 
> for position 6 from __req_res__(11)@10.0.9.176:5050
> I1019 16:59:22.712982 1601536 replica.cpp:691] Replica received learned 
> notice for position 6 from @0.0.0.0:0
> I1019 16:59:22.714227 4284416 master.cpp:5579] Received update of agent 
> 3615d417-b3fd-4e00-b794-9396456c3f6a-S1 at slave(1)@10.0.9.176:5051 
> (10.0.9.176) with total oversubscribed resources {}
> I1019 16:59:22.714383 3747840 hierarchical.cpp:555] Agent 
> 3615d417-b3fd-4e00-b794-9396456c3f6a-S1 (10.0.9.176) updated with 
> oversubscribed resources {} (total: cpus(*):8; mem(*):15360; disk(*):470832; 
> ports(*):[31000-32000], allocated: {})
> I1019 16:59:25.868541 3747840 master.cpp:5259] Re-registering agent 
> 3615d417-b3fd-4e00-b794-9396456c3f6a-S1 at slave(1)@10.0.9.176:5051 
> (10.0.9.176)
> I1019 16:59:25.868702 3747840 master.cpp:5518] Sending updated checkpointed 
> resources {} to agent 3615d417-b3fd-4e00-b794-9396456c3f6a-S1 at 
> slave(1)@10.0.9.176:5051 (10.0.9.176)
> I1019 16:59:25.869714 1064960 master.cpp:5579] Received update of agent 
> 3615d417-b3fd-4e00-b794-9396456c3f6a-S1 at slave(1)@10.0.9.176:5051 
> (10.0.9.176) with total oversubscribed resources {}
> I1019 16:59:25.869993 3211264 hierarchical.cpp:555] Agent 
> 3615d417-b3fd-4e00-b794-9396456c3f6a-S1 (10.0.9.176) updated with 
> oversubscribed resources {} (total: cpus(*):8; mem(*):15360; disk(*):470832; 
> ports(*):[31000-32000], allocated: {})
> I1019 16:59:41.006363 3211264 master.cpp:5259] Re-registering agent 
> 3615d417-b3fd-4e00-b794-9396456c3f6a-S1 at slave(1)@10.0.9.176:5051 
> (10.0.9.176)
> I1019 16:59:41.006510 3211264 master.cpp:5518] Sending updated checkpointed 
> resources {} to agent 3615d417-b3fd-4e00-b794-9396456c3f6a-S1 at 
> slave(1)@10.0.9.176:5051 (10.0.9.176)
> I1019 16:59:41.007469 2674688 master.cpp:5579] Received update of agent 
> 3615d417-b3fd-4e00-b794-9396456c3f6a-S1 at slave(1)@10.0.9.176:5051 
> (10.0.9.176) with total oversubscribed resources {}
> I1019 16:59:41.007802 2138112 hierarchical.cpp:555] Agent 
> 3615d417-b3fd-4e00-b794-9396456c3f6a-S1 (10.0.9.176) updated with 
> oversubscribed resources {} (total: cpus(*):8; mem(*):15360; disk(*):470832; 
> ports(*):[31000-32000], allocated: {})
> I1019 16:59:55.429986 2674688 master.cpp:5259] Re-registering agent 
> 3615d417-b3fd-4e00-b794-9396456c3f6a-S1 at slave(1)@10.0.9.176:5051 
> (10.0.9.176)
> I1019 16:59:55.430091 2674688 master.cpp:5518] Sending updated checkpointed 
> resources {} to agent 3615d417-b3fd-4e00-b794-9396456c3f6a-S1 at 
> slave(1)@10.0.9.176:5051 (10.0.9.176)
> I1019 16:59:55.430687 4284416 master.cpp:5579] Received update of agent 
> 3615d417-b3fd-4e00-b794-9396456c3f6a-S1 at slave(1)@10.0.9.176:5051 
> (10.0.9.176) with total oversubscribed resources {}
> I1019 16:59:55.430851 2674688 hierarchical.cpp:555] Agent 
> 3615d417-b3fd-4e00-b794-9396456c3f6a-S1 (10.0.9.176) updated with 
> oversubscribed resources {} (total: cpus(*):8; mem(*):15360; disk(*):470832; 
> ports(*):[31000-32000], allocated: {})
> I1019 17:00:10.371459 4284416 master.cpp:5259] Re-registering agent 
> 3615d417-b3fd-4e00-b794-9396456c3f6a-S1 at slave(1)@10.0.9.176:5051 
> (10.0.9.176)
> I1019 17:00:10.371608 4284416 master.cpp:5518] Sending updated checkpointed 
> resources {} to agent 3615d417-b3fd-4e00-b794-9396456c3f6a-S1 at 
> slave(1)@10.0.9.176:5051 (10.0.9.176)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to