Have you given Xymon (previously known as Hobbit, previously known as Big Brother) and/or Zabbix a try? I prefer Xymon myself, as it does not require a database backend, so has less points of failure.
-Charles On Thu, Jan 12, 2012 at 7:17 PM, Kenneth Voort <[email protected]>wrote: > I'm jumping in here since we're currently evaluating different > monitoring / metric collection options for around 200 hosts (not > including user machines) and have also evaluated Nagios / Cacti / > Ganglia, and are similarly frustrated with the lack of a single > (buzzword: turnkey) solution, FOSS or otherwise. > > So far we're quite happy with the extensibility of Nagios and the like - > we write a lot of custom scripts/checks for it - but the lack of a > coherent "dashboard" is what we're stuck on. Integration with a > ticketing system (FogBugz) is also a major plus. > > So far, the best we can come up with is a set of scripts, or a compiled > app, that imports/exports data from one system to another... > > On 12-01-11 5:27 PM, drich wrote: > > Sorry, I fell of the face of the earth with our end-of-year closure and > > vacation.... > > > > "Large" at the moment is about 1000 hosts being monitored with a mix of > > Nagios (formerly via. GroundWork), Cacti, Ganglia, and a handful of > > in-house developed tools. It could potentially expand to several > > thousand hosts if we decide to start monitoring more than just our core > > infrastructure. > > > > At a high level, the requirements are to be able to monitor the status > > of hosts, network services, host resources, and applications; to display > > this information via. a web interface, to have management and > > application owner "dashboards", to be able to notify of outages and > > escalate those notifications, and in a perfect world to be able to open > > tickets in our ticketing system. I would also like to see an API to both > > manage the system and to give us a way to automate the addition and > > deletion of the hosts and services being monitored. And the above should > > have both a web-based and command-line configuration system. Having > > integrated monitoring and metric collection would be a plus - i.e. not > > having separate non-integrated applications like we do today with Nagios > > and Cacti/Ganglia. And I want a pony! :-) > > > > "Hosts" in the above statement might be Linux, Windows, Mac, network > > devices, storage nodes or embedded devices. "Services" includes > > traditional network services like DNS, LDAP, SMTP, etc; as well as > > custom in-house developed applications that we will need to build custom > > tools to monitor. "Applications" refers to end-to-end application > > monitoring such as making JMX queries for java-based applications or > > running transactions against a web site. > > > > We have most of the above with our current system but it's a > > non-integrated hack. I have looked at Opsview in the past and it may be > > the best choice for us as we can migrate our existing system to it with > > very little effort - but it still has the same limitations of Nagios > > (primarily the lack of any application monitoring, just hosts and > > services). I also spent some time looking at ZenOSS a while back, but at > > that point the product wasn't mature enough and too many of my questions > > were met with "you can easily develop a script to do that" (like send > > SMS notifications). I was going to take a look at Zabbix, but their site > > isn't resolving in DNS today. The last time I looked at OpenNMS it was > > really focused on network and SNMP monitoring and wasn't easily > > extensible for custom monitoring. I have also looked at commercial > > solutions like HP's BTO suite and SolarWind's products, but they don't > > really meet our needs (SolarWinds is too network focused and their host > > monitoring tools too immature). > > > > On 26.12.2011 11:47, Michael Ryder wrote: > > > >> What exactly is the definition of a "very large site?" Perhaps > >> defining some parameters would be useful? > >> > >> Also, what are the requirements? Otherwise there's no way to do any > >> comparison other than node, service or interface count. > >> > >> Mike > >> > >> On Mon, Dec 26, 2011 at 1:45 PM, Daniel Rich <[email protected]<mailto: > [email protected]>> wrote: > >>> Not to hijack Tom's thread, but does anyone have any good experiences > >>> on tools to monitor a very large site? ... > > > > > > > > -- > > Dan Rich <[email protected]> > > http://www.employees.org/~drich/ <http://www.employees.org/%7Edrich/> > > /"Step up to red alert!" "Are you sure, sir? > > It means changing the bulb in the sign..."/ > > - Red Dwarf (BBC) > > > > > > _______________________________________________ > > Discuss mailing list > > [email protected] > > https://lists.lopsa.org/cgi-bin/mailman/listinfo/discuss > > This list provided by the League of Professional System Administrators > > http://lopsa.org/ > > -- > Kenneth Voort > Packet Fiend Extrodianaire > kenneth (at) voort <killspam> ca > FDF1 6265 EBAB C05C FD06 1AED 158E 14D6 37CD E87F | pgp encrypted email > preferred > > Help! Help! I'm being repressed! > _______________________________________________ > Discuss mailing list > [email protected] > https://lists.lopsa.org/cgi-bin/mailman/listinfo/discuss > This list provided by the League of Professional System Administrators > http://lopsa.org/ >
_______________________________________________ Discuss mailing list [email protected] https://lists.lopsa.org/cgi-bin/mailman/listinfo/discuss This list provided by the League of Professional System Administrators http://lopsa.org/
