Hi, only a try to sort some things out.
Didn't know much of hadoop cluster, but think cluster means different clusternodes. Did you check the master node against the free disk space or each node independently ? An entry in the hosts.cfg for the world accessible hadoop cluster ip/dns name and different entrys for each clusternode? We use a small linux webcluster with replicated MySQL databases and webdirectoys. For replication we use DRBD and pacemaker as resource manager. We get alerts for the whole cluster and each cluster node. So, I use two different check_disk alerts. One for the replicated volume: check_linux_drbd0_disk. Volume size and free disk space is the same over each cluster node. The second check_disk alert checks the real hdd in each clusternode: check_linux_root_disk. It's the physical hdd plugged into each cluster node. $HOSTADDRESS$: For check_linux_drbd0_disk it is the active, world accessible address. For example: www.example.com For check_linux_root_disk it is the internal address of each clusternode. For example clusternode1.internal.com, clusternode2.internal.com The objects/commands.cfg: define command{ command_name check_linux_drbd0_disk command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -t 60 -p 5666 -n -c check_drbd0 } define command{ command_name check_linux_root_disk command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -t 60 -p 5666 -n -c check_sda1 } The /usr/local/nagios/etc/nrpe.cfg on each clusternode: command[check_drbd0]=/usr/local/nagios/libexec/check_disk -w 15% -c 10% -p /dev/drbd0 command[check_sda1]=/usr/local/nagios/libexec/check_disk -w 15% -c 10% -p /dev/sda1 With this, we get alerts: Running out of disk space for www.example.com Running out of disk space for each clusternode Regards, Markus. Earn money: http://www.verdiene-geld-im-netz.de/en/index.html Von: Help [mailto:help-bounces+markus.heinze=esta-bw...@monitoring-plugins.org] Im Auftrag von Natva, Arun Kumar Gesendet: Freitag, 23. Januar 2015 23:47 An: help@monitoring-plugins.org Betreff: help needed with nagios alert Hi, I am using nagios for alerting in our hadoop cluster. When I setup a check_disk alert on all the nodes in the cluster, we are getting emails for all the hosts even though only one of the nodes exceeds the disk space threshold. I tried multiple things but I am unable to figure out why nagios sends alerts for all hosts instead of just one host. Can you please help Regards, Arun.