We're monitoring around 1000 hosts and 4700 services.
We are using the last version of Opsview Community like Simon although his 
setup sounds a bit more fault tolerant.

We have 1 master server with about 30 slave servers monitoring various remote 
sites.
Easy of distributed setup was what won us over to Opsview several years ago but 
as they moved the distributed version to the enterprise commercial edition I am 
starting to pay attention again to the different variants of Nagios out there.
Also centralized web based configuration front end was another huge plus as 
engineers don't have to understand Nagios to setup hosts.

The racoon setup sounds like some good stuff.

James

-----Original Message-----
From: Simone Felici [mailto:s.fel...@mclink.eu] 
Sent: Friday, May 18, 2012 3:33 AM
To: nagios-users@lists.sourceforge.net
Subject: Re: [Nagios-users] How many hosts and services are you monitoring with 
Nagios?


Impressive :)
We're monitoring ~2000 hosts and ~10000 services, every 5 minutes.
Architecture used: OPSView Community edition, the last free version before it 
started to make the distributed version commercial :/ Two central servers 
(active/standby - drbd) as single point for management and collecting all 
passive checks executed by the slave servers. Performance data saved into rrd 
files as well on an external BIG database server. Configuration resides on a 
cluster MySQL installation (drbd).
4 slave "datacenter installations" with 2 servers per "datacenter" in 
active/active load balancing.
Traps handling supported on all servers with rules logic.
Pros:
- Open Source: at least until version 3 - for our setup. Simple single instance 
with fewer functions available as well on version 4.
- Easy to manage: the prupose was to create monitoring system and then let the 
management to other people with less technical skills
- distributed setup
- RBAC
Disadvantages:
- no longer Open Source: see above
- Central server suffering on cpu by GUI implementation and other bg jobs
- Not all nagios parameters editable as we like: i.e. cannot customize same 
checks with different intervals without having to re-create new ones. Think on 
HTTP service on servers with different loads and the need to extend the retries 
on high load servers. no way expect creating "HTTP" and "HTTP High Load" 
services.
Maybe there are more pros (and disadvantages), but it's not the right place.
BTW I'll look forward to wait for this solution; seems interesting!

Simon

Il 17/05/2012 16:43, Max Schubert ha scritto:
> Hi,
>
> I like it when people periodically post numbers and architecture 
> summaries, I am guessing with the distributed frameworks out now for 
> Nagios this thread might be seeing bigger numbers than past threads 
> have.
>
> With our custom-built distributed Nagios-based monitoring system, we 
> are currently monitoring 18000+ hosts every 5 minutes and 100k+ active 
> services (plenty of passive services in addition to the actives) every
> 5 mins as well.  We collect performance data from every check as well 
> and pass that on to a highly distributed and scalabe time-series data 
> warehouse another team in our organization has built (which is why we 
> have the 5 min interval requirement)
>
> We also do trap ingest using SNMPTT with a few custom mods, but not 
> going to include those numbers as they never have required the 
> optimizations the polling has required.
>
> This isn't a monolithic instance, we have 6 projects using instances 
> of our distributed Nagios-based software, called Racon (soon my 
> manager will give our team to package it as open source - so I hear at 
> least).  We built it on core Nagios with a custom database layer based 
> on a very very early version of Merlin's database abstraction layer 
> (thank you Andreas!) - we have a custom client/server network-based 
> notification framework in use (we will release that as well) along 
> with a custom NEB/perl based client-server framework (also releasable, 
> just need time scheduled) for sending and processing performance data
> - the performance and notification framework are both horizontally 
> scalabe and network fault tolerant.
>
> What kinds of numbers of hosts and services are you all monitoring?
> Which add-ons / distributed frameworks are you using?

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and threat 
landscape has changed and how IT managers can respond. Discussions will include 
endpoint security, mobile security and the latest in malware threats. 
http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Reply via email to