----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34470/#review84515 -----------------------------------------------------------
Ship it! Ship It! - Dmitro Lisnichenko On May 20, 2015, 2:24 p.m., Dmytro Sen wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/34470/ > ----------------------------------------------------------- > > (Updated May 20, 2015, 2:24 p.m.) > > > Review request for Ambari, Dmitro Lisnichenko and Myroslav Papirkovskyy. > > > Bugs: AMBARI-8768 > https://issues.apache.org/jira/browse/AMBARI-8768 > > > Repository: ambari > > > Description > ------- > > Ambari agent is succeptible to hanging when the 'df' command blocks. This > causes loss of heartbeat and manageability. I've found this has happened with > NFS gateway's HDFS mount point blocking when HDFS isn't available (we had set > the NFS soft option on the mount point but then realized that wasn't a good > idea as not everyone's processes and scripts will handle failure gracefully > and retry properly). > When restarting the agent it also leaves the df process bound to point 8670 > which requires manually killing that in order to get the ambari agent to > restart and bind successfully, but even then you'll see a hang at this point > after connecting to the 8440 ca and the agent never fully initializes so the > heartbeat still never comes back. > The df command should be either in another thread non-blocking the main > heartbeat and management functions or should have a timeout set on the > command execution to prevent this issue. > > > Diffs > ----- > > ambari-agent/src/main/python/ambari_agent/Hardware.py 439803d > > ambari-server/src/main/java/org/apache/ambari/server/configuration/Configuration.java > c2cf2c0 > > ambari-server/src/test/java/org/apache/ambari/server/agent/TestHeartbeatHandler.java > 2b1c355 > > Diff: https://reviews.apache.org/r/34470/diff/ > > > Testing > ------- > > unit tests passed > > > Thanks, > > Dmytro Sen > >
