2009/3/24 Christopher McAtackney <crist...@gmail.com>: > Hi all, > > I was wondering if someone could give a brief overview of the pros / > cons of using NRPE to monitor my remote hosts versus using the > check_by_ssh command? > > I'm aware that check_by_ssh increases the CPU overhead, but I'm not > clear on the level of impact here - does this increase the load on the > monitoring machine in direction relation to the number of hosts being > monitored? For example, if I was using check_by_ssh to monitor, say, > 2000 services spread across 200 hosts, would I experience significant > slowdown on my monitoring machine? > > Cheers for any info, > > Chris >
SSH is going to slow it down on both sides of the communication. SSH does quite a bit more in terms of setting up the connection which involves using asymmetric encryption to setup a shared secret for symmetric encryption and verifying keys for the asymmetric part, verifying access, allocating a session. Whereas NRPE even with encryption just does a simple pre-shared secret for the symmetric encryption, much faster even if using the same encryption algorithm One thing you could do with SSH to speed it up (and I would argue make it faster than NRPE depending on the stability of your network)) would be to use ControlMaster. ControlMaster is a SSH v2 feature, where you create a connection and can open up multiple sessions with that ControlMaster for other SSH processes. This saves you not only the key-exchange heavy lifting but also you're not opening up a new socket on the remote host. In order to really make it worth it you'd have to spawn a process that was continuously connected. I wrote an ugly check_by_ssh that would spawn a ControlMaster if one didn't exist and use it if it did. Reduced the load/latency quite a bit for SSH checks. Though if I had to do it again I'd used 'ControlMaster auto' (man 5 ssh_config) and create a separate check that was responsible for maintaining the ControlMaster, then you could use the stock check_by_ssh without any modifications. That all being said, you might want to think about a distributed setup anyhow, if nothing more for redundancy. 200 servers and 2,000 checks is alot of responsibility for a singleton, you could break it 50/50 between two servers that could take over for the other one if it fails. .r' ------------------------------------------------------------------------------ _______________________________________________ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null