Sebastian Toader created AMBARI-19416:
-----------------------------------------

             Summary: Ambari agents remain in heartbeat lost state after ambari 
server restart
                 Key: AMBARI-19416
                 URL: https://issues.apache.org/jira/browse/AMBARI-19416
             Project: Ambari
          Issue Type: Bug
            Reporter: Sebastian Toader
            Assignee: Sebastian Toader
            Priority: Critical
             Fix For: 3.0.0


With the implementation https://issues.apache.org/jira/browse/AMBARI-18505 the 
execution of status commands is done in a separate child process. Status 
commands received from the server by ambari agent are passed to the status 
command executor child process via Queue ({{multiprocessing.Queue()}}. In case 
the child process is killed, either manually or by the parent process the queue 
may end up in bad state (see: http://bugs.python.org/issue20527) thus the 
re-spawned status command executor child process may not receive new status 
commands any more.

When ambari server is restarted the agent re-registers with ambari server and 
upon re-registration it re-spawns the status command child process in order to 
receive up to date agent configs 
(https://issues.apache.org/jira/browse/AMBARI-19392). In this case the status 
commands won't be received by the status command executor child process due the 
queue may get stuck leading the ambari agent to stay in heatbeat lost state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to