Re: [lopsa-discuss] monitoring for a very large site?

Charles Jones Thu, 12 Jan 2012 18:28:55 -0800

Have you given Xymon (previously known as Hobbit, previously known as Big
Brother) and/or Zabbix a try?  I prefer Xymon myself, as it does not
require a database backend, so has less points of failure.


-Charles

On Thu, Jan 12, 2012 at 7:17 PM, Kenneth Voort <[email protected]>wrote:

> I'm jumping in here since we're currently evaluating different
> monitoring / metric collection options for around 200 hosts (not
> including user machines) and have also evaluated Nagios / Cacti /
> Ganglia, and are similarly frustrated with the lack of a single
> (buzzword: turnkey) solution, FOSS or otherwise.
>
> So far we're quite happy with the extensibility of Nagios and the like -
> we write a lot of custom scripts/checks for it - but the lack of a
> coherent "dashboard" is what we're stuck on. Integration with a
> ticketing system (FogBugz) is also a major plus.
>
> So far, the best we can come up with is a set of scripts, or a compiled
> app, that imports/exports data from one system to another...
>
> On 12-01-11 5:27 PM, drich wrote:
> > Sorry, I fell of the face of the earth with our end-of-year closure and
> > vacation....
> >
> > "Large" at the moment is about 1000 hosts being monitored with a mix of
> > Nagios (formerly via. GroundWork), Cacti, Ganglia, and a handful of
> > in-house developed tools. It could potentially expand to several
> > thousand hosts if we decide to start monitoring more than just our core
> > infrastructure.
> >
> > At a high level, the requirements are to be able to monitor the status
> > of hosts, network services, host resources, and applications; to display
> > this information via. a web interface, to have management and
> > application owner "dashboards", to be able to notify of outages and
> > escalate those notifications, and in a perfect world to be able to open
> > tickets in our ticketing system. I would also like to see an API to both
> > manage the system and to give us a way to automate the addition and
> > deletion of the hosts and services being monitored. And the above should
> > have both a web-based and command-line configuration system. Having
> > integrated monitoring and metric collection would be a plus - i.e. not
> > having separate non-integrated applications like we do today with Nagios
> > and Cacti/Ganglia. And I want a pony! :-)
> >
> > "Hosts" in the above statement might be Linux, Windows, Mac, network
> > devices, storage nodes or embedded devices. "Services" includes
> > traditional network services like DNS, LDAP, SMTP, etc; as well as
> > custom in-house developed applications that we will need to build custom
> > tools to monitor. "Applications" refers to end-to-end application
> > monitoring such as making JMX queries for java-based applications or
> > running transactions against a web site.
> >
> > We have most of the above with our current system but it's a
> > non-integrated hack. I have looked at Opsview in the past and it may be
> > the best choice for us as we can migrate our existing system to it with
> > very little effort - but it still has the same limitations of Nagios
> > (primarily the lack of any application monitoring, just hosts and
> > services). I also spent some time looking at ZenOSS a while back, but at
> > that point the product wasn't mature enough and too many of my questions
> > were met with "you can easily develop a script to do that" (like send
> > SMS notifications). I was going to take a look at Zabbix, but their site
> > isn't resolving in DNS today. The last time I looked at OpenNMS it was
> > really focused on network and SNMP monitoring and wasn't easily
> > extensible for custom monitoring. I have also looked at commercial
> > solutions like HP's BTO suite and SolarWind's products, but they don't
> > really meet our needs (SolarWinds is too network focused and their host
> > monitoring tools too immature).
> >
> > On 26.12.2011 11:47, Michael Ryder wrote:
> >
> >> What exactly is the definition of a "very large site?"  Perhaps
> >> defining some parameters would be useful?
> >>
> >> Also, what are the requirements?   Otherwise there's no way to do any
> >> comparison other than node, service or interface count.
> >>
> >> Mike
> >>
> >> On Mon, Dec 26, 2011 at 1:45 PM, Daniel Rich <[email protected]<mailto:
> [email protected]>> wrote:
> >>> Not to hijack Tom's thread, but does anyone have any good experiences
> >>> on tools to monitor a very large site? ...
> >
> >
> >
> > --
> > Dan Rich <[email protected]>
> > http://www.employees.org/~drich/ <http://www.employees.org/%7Edrich/>
> > /"Step up to red alert!" "Are you sure, sir?
> > It means changing the bulb in the sign..."/
> >       - Red Dwarf (BBC)
> >
> >
> > _______________________________________________
> > Discuss mailing list
> > [email protected]
> > https://lists.lopsa.org/cgi-bin/mailman/listinfo/discuss
> > This list provided by the League of Professional System Administrators
> >  http://lopsa.org/
>
> --
> Kenneth Voort
> Packet Fiend Extrodianaire
> kenneth (at) voort <killspam> ca
> FDF1 6265 EBAB C05C FD06 1AED 158E 14D6 37CD E87F | pgp encrypted email
> preferred
>
> Help! Help! I'm being repressed!
> _______________________________________________
> Discuss mailing list
> [email protected]
> https://lists.lopsa.org/cgi-bin/mailman/listinfo/discuss
> This list provided by the League of Professional System Administrators
>  http://lopsa.org/
>

_______________________________________________
Discuss mailing list
[email protected]
https://lists.lopsa.org/cgi-bin/mailman/listinfo/discuss
This list provided by the League of Professional System Administrators
 http://lopsa.org/

Re: [lopsa-discuss] monitoring for a very *large* site?

Reply via email to

Re: [lopsa-discuss] monitoring for a very large site?