On Thu, 28 Oct 2010 10:31:14 -0700, Christofer Hardy  
<[email protected]> wrote:

> What size shop are you? Small, Medium, Large?
> Do you need to get every page?
> Do you use SMS or Paging?

By LOPSA stanards we're a low/mid shop. ~220 servers, just pushing past  
100TB storage. Since we're a largeish University (24K accounts) with a  
single campus, we don't have to worry about getting remote sites working  
together. We're HA-to-a-point, in that we don't have a formal on-call  
schedule (we have an informal one), and we pay more attention to service  
during crunch times (run up to and during finals) than fallow times  
(breaks).

We started out using Big Brother, but maintaining that infrastructure  
started getting tedious so we invested in Dartware Intermapper. The  
alerting logic leaves something to be desired (we've yet to be able to  
trigger an alert for "2 of 3 servers down" without resorting to  
custom-written probes) but it does support decent alerting options. Recent  
versions have started supporting Windows WMI probes, which will be handy  
for creating probes monitoring for specific service processes.

Am I happy with it? Not terribly. Because of its limitations, it isn't  
easy to build truly intelligent alerts  
(http://blog.serverfault.com/post/1264376462/intelligent-alerts). Because  
of the complexity in doing so, such alerts stale out way faster than  
motivation to fix them, so it doesn't get done.

- Greg Riedesel
_______________________________________________
Discuss mailing list
[email protected]
http://lists.lopsa.org/cgi-bin/mailman/listinfo/discuss
This list provided by the League of Professional System Administrators
 http://lopsa.org/

Reply via email to