AW: help needed with nagios alert

Heinze, Markus Tue, 27 Jan 2015 03:08:18 -0800

Hi,

only a try to sort some things out.


Didn't know much of hadoop cluster, but think cluster means different 
clusternodes.
Did you check the master node against the free disk space or each node 
independently ?
An entry in the hosts.cfg for the world accessible hadoop cluster ip/dns name 
and different entrys for each clusternode?


We use a small linux webcluster with replicated MySQL databases and 
webdirectoys.
For replication we use DRBD and pacemaker as resource manager.
We get alerts for the whole cluster and each cluster node.


So, I use two different check_disk alerts. One for the replicated volume: 
check_linux_drbd0_disk.
Volume size and free disk space is the same over each cluster node.

The second check_disk alert checks the real hdd in each clusternode: 
check_linux_root_disk.
It's the physical hdd plugged into each cluster node.


$HOSTADDRESS$:
For check_linux_drbd0_disk it is the active, world accessible address. For 
example: www.example.com
For check_linux_root_disk it is the internal address of each clusternode. For 
example clusternode1.internal.com, clusternode2.internal.com


The objects/commands.cfg:
define command{
        command_name    check_linux_drbd0_disk
        command_line    $USER1$/check_nrpe -H $HOSTADDRESS$ -t 60 -p 5666 -n -c 
check_drbd0
        }


define command{
        command_name    check_linux_root_disk
        command_line    $USER1$/check_nrpe -H $HOSTADDRESS$ -t 60 -p 5666 -n -c 
check_sda1
        }


The /usr/local/nagios/etc/nrpe.cfg on each clusternode:
command[check_drbd0]=/usr/local/nagios/libexec/check_disk -w 15% -c 10% -p 
/dev/drbd0
command[check_sda1]=/usr/local/nagios/libexec/check_disk -w 15% -c 10% -p 
/dev/sda1


With this, we get alerts:
Running out of disk space for www.example.com
Running out of disk space for each clusternode


Regards,
Markus.



Earn money: http://www.verdiene-geld-im-netz.de/en/index.html



Von: Help [mailto:help-bounces+markus.heinze=esta-bw...@monitoring-plugins.org] 
Im Auftrag von Natva, Arun Kumar
Gesendet: Freitag, 23. Januar 2015 23:47
An: help@monitoring-plugins.org
Betreff: help needed with nagios alert

Hi,
I am using nagios for alerting in our hadoop cluster.

When I setup a check_disk alert on all the nodes in the cluster, we are getting 
emails for all the hosts even though only one of the nodes exceeds the disk 
space threshold.

I tried multiple things but I am unable to figure out why nagios sends alerts 
for all hosts instead of just one host. Can you please help

Regards,
Arun.

AW: help needed with nagios alert

Reply via email to