[ 
https://issues.apache.org/jira/browse/AMBARI-19416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15827699#comment-15827699
 ] 

Hudson commented on AMBARI-19416:
---------------------------------

FAILURE: Integrated in Jenkins build Ambari-branch-2.5 #749 (See 
[https://builds.apache.org/job/Ambari-branch-2.5/749/])
AMBARI-19416. Ambari agents remain in heartbeat lost state after ambari 
(stoader: 
[http://git-wip-us.apache.org/repos/asf?p=ambari.git&a=commit&h=36f742246530af98913051bb27dcd3b20368e474])
* (edit) ambari-agent/src/main/python/ambari_agent/main.py
* (edit) ambari-agent/src/test/python/ambari_agent/TestHeartbeat.py
* (edit) ambari-agent/src/test/python/ambari_agent/TestMain.py
* (edit) ambari-agent/src/main/python/ambari_agent/Controller.py
* (edit) ambari-agent/src/main/python/ambari_agent/ActionQueue.py


> Ambari agents remain in heartbeat lost state after ambari server restart
> ------------------------------------------------------------------------
>
>                 Key: AMBARI-19416
>                 URL: https://issues.apache.org/jira/browse/AMBARI-19416
>             Project: Ambari
>          Issue Type: Bug
>            Reporter: Sebastian Toader
>            Assignee: Sebastian Toader
>            Priority: Critical
>             Fix For: 3.0.0
>
>         Attachments: AMBARI-19416.v3.patch
>
>
> With the implementation https://issues.apache.org/jira/browse/AMBARI-18505 
> the execution of status commands is done in a separate child process. Status 
> commands received from the server by ambari agent are passed to the status 
> command executor child process via Queue ({{multiprocessing.Queue()}}. In 
> case the child process is killed, either manually or by the parent process 
> the queue may end up in bad state (see: http://bugs.python.org/issue20527) 
> thus the re-spawned status command executor child process may not receive new 
> status commands any more.
> When ambari server is restarted the agent re-registers with ambari server and 
> upon re-registration it re-spawns the status command child process in order 
> to receive up to date agent configs 
> (https://issues.apache.org/jira/browse/AMBARI-19392). In this case the status 
> commands won't be received by the status command executor child process due 
> the queue may get stuck leading the ambari agent to stay in heatbeat lost 
> state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to