Sebastian Toader created AMBARI-19416:
-----------------------------------------
Summary: Ambari agents remain in heartbeat lost state after ambari
server restart
Key: AMBARI-19416
URL: https://issues.apache.org/jira/browse/AMBARI-19416
Project: Ambari
Issue Type: Bug
Reporter: Sebastian Toader
Assignee: Sebastian Toader
Priority: Critical
Fix For: 3.0.0
With the implementation https://issues.apache.org/jira/browse/AMBARI-18505 the
execution of status commands is done in a separate child process. Status
commands received from the server by ambari agent are passed to the status
command executor child process via Queue ({{multiprocessing.Queue()}}. In case
the child process is killed, either manually or by the parent process the queue
may end up in bad state (see: http://bugs.python.org/issue20527) thus the
re-spawned status command executor child process may not receive new status
commands any more.
When ambari server is restarted the agent re-registers with ambari server and
upon re-registration it re-spawns the status command child process in order to
receive up to date agent configs
(https://issues.apache.org/jira/browse/AMBARI-19392). In this case the status
commands won't be received by the status command executor child process due the
queue may get stuck leading the ambari agent to stay in heatbeat lost state.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)