-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34470/#review84515
-----------------------------------------------------------

Ship it!


Ship It!

- Dmitro Lisnichenko


On May 20, 2015, 2:24 p.m., Dmytro Sen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34470/
> -----------------------------------------------------------
> 
> (Updated May 20, 2015, 2:24 p.m.)
> 
> 
> Review request for Ambari, Dmitro Lisnichenko and Myroslav Papirkovskyy.
> 
> 
> Bugs: AMBARI-8768
>     https://issues.apache.org/jira/browse/AMBARI-8768
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> Ambari agent is succeptible to hanging when the 'df' command blocks. This 
> causes loss of heartbeat and manageability. I've found this has happened with 
> NFS gateway's HDFS mount point blocking when HDFS isn't available (we had set 
> the NFS soft option on the mount point but then realized that wasn't a good 
> idea as not everyone's processes and scripts will handle failure gracefully 
> and retry properly).
> When restarting the agent it also leaves the df process bound to point 8670 
> which requires manually killing that in order to get the ambari agent to 
> restart and bind successfully, but even then you'll see a hang at this point 
> after connecting to the 8440 ca and the agent never fully initializes so the 
> heartbeat still never comes back.
> The df command should be either in another thread non-blocking the main 
> heartbeat and management functions or should have a timeout set on the 
> command execution to prevent this issue.
> 
> 
> Diffs
> -----
> 
>   ambari-agent/src/main/python/ambari_agent/Hardware.py 439803d 
>   
> ambari-server/src/main/java/org/apache/ambari/server/configuration/Configuration.java
>  c2cf2c0 
>   
> ambari-server/src/test/java/org/apache/ambari/server/agent/TestHeartbeatHandler.java
>  2b1c355 
> 
> Diff: https://reviews.apache.org/r/34470/diff/
> 
> 
> Testing
> -------
> 
> unit tests passed
> 
> 
> Thanks,
> 
> Dmytro Sen
> 
>

Reply via email to