Re: [lopsa-discuss] monitoring for a very large site?

drich Wed, 11 Jan 2012 14:29:19 -0800

 

Sorry, I fell of the face of the earth with our end-of-year closure
and vacation....

"Large" at the moment is about 1000 hosts being
monitored with a mix of Nagios (formerly via. GroundWork), Cacti,
Ganglia, and a handful of in-house developed tools. It could potentially
expand to several thousand hosts if we decide to start monitoring more
than just our core infrastructure. 

At a high level, the requirements
are to be able to monitor the status of hosts, network services, host
resources, and applications; to display this information via. a web
interface, to have management and application owner "dashboards", to be
able to notify of outages and escalate those notifications, and in a
perfect world to be able to open tickets in our ticketing system. I
would also like to see an API to both manage the system and to give us a
way to automate the addition and deletion of the hosts and services
being monitored. And the above should have both a web-based and
command-line configuration system. Having integrated monitoring and
metric collection would be a plus - i.e. not having separate
non-integrated applications like we do today with Nagios and
Cacti/Ganglia. And I want a pony! :-) 

"Hosts" in the above statement
might be Linux, Windows, Mac, network devices, storage nodes or embedded
devices. "Services" includes traditional network services like DNS,
LDAP, SMTP, etc; as well as custom in-house developed applications that
we will need to build custom tools to monitor. "Applications" refers to
end-to-end application monitoring such as making JMX queries for
java-based applications or running transactions against a web site. 

We
have most of the above with our current system but it's a non-integrated
hack. I have looked at Opsview in the past and it may be the best choice
for us as we can migrate our existing system to it with very little
effort - but it still has the same limitations of Nagios (primarily the
lack of any application monitoring, just hosts and services). I also
spent some time looking at ZenOSS a while back, but at that point the
product wasn't mature enough and too many of my questions were met with
"you can easily develop a script to do that" (like send SMS
notifications). I was going to take a look at Zabbix, but their site
isn't resolving in DNS today. The last time I looked at OpenNMS it was
really focused on network and SNMP monitoring and wasn't easily
extensible for custom monitoring. I have also looked at commercial
solutions like HP's BTO suite and SolarWind's products, but they don't
really meet our needs (SolarWinds is too network focused and their host
monitoring tools too immature). 

On 26.12.2011 11:47, Michael Ryder
wrote: 

> What exactly is the definition of a "very large site?"
Perhaps
> defining some parameters would be useful?
> 
> Also, what are
the requirements? Otherwise there's no way to do any
> comparison other
than node, service or interface count.
> 
> Mike
> 
> On Mon, Dec 26,
2011 at 1:45 PM, Daniel Rich <[email protected]> wrote:
> 
>> Not to
hijack Tom's thread, but does anyone have any good experiences on tools
to monitor a very large site? ...

-- 

Dan Rich <[email protected]>

http://www.employees.org/~drich/ [1]
 "Step up to red alert!" "Are you
sure, sir?
 It means changing the bulb in the sign..."
 - Red Dwarf
(BBC) 

Links:
------
[1] http://www.employees.org/%7Edrich/

_______________________________________________
Discuss mailing list
[email protected]
https://lists.lopsa.org/cgi-bin/mailman/listinfo/discuss
This list provided by the League of Professional System Administrators
 http://lopsa.org/

Re: [lopsa-discuss] monitoring for a very *large* site?

Reply via email to

Re: [lopsa-discuss] monitoring for a very large site?