[ 
https://issues.apache.org/jira/browse/MESOS-5635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-5635:
-----------------------------
    Summary: Agent repeatedly reregisters, possible one-way disconnection  
(was: Agent repeatedly reregisters, possible one-way partition)

> Agent repeatedly reregisters, possible one-way disconnection
> ------------------------------------------------------------
>
>                 Key: MESOS-5635
>                 URL: https://issues.apache.org/jira/browse/MESOS-5635
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Greg Mann
>              Labels: agent, mesosphere
>
> This issue was observed recently on an internal test cluster. Due to a bug in 
> the agent code (MESOS-5629), regular segfaults were occurring on an agent. 
> While the agent was recovering from one of these failures, it segfaulted 
> again. After this time, we noticed that after beginning recovery, the agent 
> did not print {{Finished recovery}}, and its logs did not show any indication 
> of reregistering with the master. Looking at the master's logs, however, the 
> following line was observed repeatedly, at intervals on the order of seconds:
> {code}
> W0617 21:27:07.010679  2016 master.cpp:4773] Agent 
> 2b899dd3-3b1f-4520-a6b2-98e32196f723-S4 at slave(1)@10.10.0.87:5051 
> (10.10.0.87) attempted to re-register after removal; shutting it down
> {code}
> These re-registration attempts had no corresponding lines in the agent log.
> Subsequently deleting the contents of the agent's {{work_dir}} and restarting 
> it led to a successful registration with a new agent ID:
> {code}
> I0617 21:29:01.246119  2011 master.cpp:4635] Registering agent at 
> slave(1)@10.10.0.87:5051 (10.10.0.87) with id 
> 2b899dd3-3b1f-4520-a6b2-98e32196f723-S5
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to