Sorry, I fell of the face of the earth with our end-of-year closure and vacation....
"Large" at the moment is about 1000 hosts being monitored with a mix of Nagios (formerly via. GroundWork), Cacti, Ganglia, and a handful of in-house developed tools. It could potentially expand to several thousand hosts if we decide to start monitoring more than just our core infrastructure. At a high level, the requirements are to be able to monitor the status of hosts, network services, host resources, and applications; to display this information via. a web interface, to have management and application owner "dashboards", to be able to notify of outages and escalate those notifications, and in a perfect world to be able to open tickets in our ticketing system. I would also like to see an API to both manage the system and to give us a way to automate the addition and deletion of the hosts and services being monitored. And the above should have both a web-based and command-line configuration system. Having integrated monitoring and metric collection would be a plus - i.e. not having separate non-integrated applications like we do today with Nagios and Cacti/Ganglia. And I want a pony! :-) "Hosts" in the above statement might be Linux, Windows, Mac, network devices, storage nodes or embedded devices. "Services" includes traditional network services like DNS, LDAP, SMTP, etc; as well as custom in-house developed applications that we will need to build custom tools to monitor. "Applications" refers to end-to-end application monitoring such as making JMX queries for java-based applications or running transactions against a web site. We have most of the above with our current system but it's a non-integrated hack. I have looked at Opsview in the past and it may be the best choice for us as we can migrate our existing system to it with very little effort - but it still has the same limitations of Nagios (primarily the lack of any application monitoring, just hosts and services). I also spent some time looking at ZenOSS a while back, but at that point the product wasn't mature enough and too many of my questions were met with "you can easily develop a script to do that" (like send SMS notifications). I was going to take a look at Zabbix, but their site isn't resolving in DNS today. The last time I looked at OpenNMS it was really focused on network and SNMP monitoring and wasn't easily extensible for custom monitoring. I have also looked at commercial solutions like HP's BTO suite and SolarWind's products, but they don't really meet our needs (SolarWinds is too network focused and their host monitoring tools too immature). On 26.12.2011 11:47, Michael Ryder wrote: > What exactly is the definition of a "very large site?" Perhaps > defining some parameters would be useful? > > Also, what are the requirements? Otherwise there's no way to do any > comparison other than node, service or interface count. > > Mike > > On Mon, Dec 26, 2011 at 1:45 PM, Daniel Rich <[email protected]> wrote: > >> Not to hijack Tom's thread, but does anyone have any good experiences on tools to monitor a very large site? ... -- Dan Rich <[email protected]> http://www.employees.org/~drich/ [1] "Step up to red alert!" "Are you sure, sir? It means changing the bulb in the sign..." - Red Dwarf (BBC) Links: ------ [1] http://www.employees.org/%7Edrich/
_______________________________________________ Discuss mailing list [email protected] https://lists.lopsa.org/cgi-bin/mailman/listinfo/discuss This list provided by the League of Professional System Administrators http://lopsa.org/
