[ https://issues.apache.org/jira/browse/HADOOP-4594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12646010#action_12646010 ]
Edward Capriolo commented on HADOOP-4594: ----------------------------------------- I have read up on Chukwa and HADOOP-3628, I think ping() would work with nagios well. My goal was to provided something that works today. I agree that jps and grep is not a great way to monitor anything, but I also believe in the 80 / 20 rule. Checking the reply of each Component web interface is a step better. I was thinking checks like this might be meaningful useful without being complicated. NumberOfDeadNodes > X -- This alarm would go off if the number of dead nodes in the cluster goes higher then X PercentageOfDeadNodes > X -- This would alarm if the % of dead nodes goes higher then X WriteFileReadFile (String hdfspath ) -- This would attempt to read and write a file. ReadFile (String hdfspath) -- would attempt to read a file TotalFreeDFSPrecent < X -- Would alarm when the DFS spaces falls below a certain value. These are some things that someone in an administrative role would want. > Monitoring Scripts for Nagios > ----------------------------- > > Key: HADOOP-4594 > URL: https://issues.apache.org/jira/browse/HADOOP-4594 > Project: Hadoop Core > Issue Type: Wish > Reporter: Edward Capriolo > Priority: Minor > Attachments: HADOOP-4594.patch > > > I would like to create a set of local via NRPE and remote check scripts that > can be shipped with the hadoop distribution and used to monitor Hadoop. I > already have completed the NRPE scripts. The second set of scripts would use > wget to read the output of the hadoop web interfaces. Do these already exist? > I guess these would fall under a new contrib project. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.