On Mon, 2002-11-04 at 11:17, [EMAIL PROTECTED] wrote:
> Hello,
> 
> I'm working on a project to use ganglia to monitor a cluster of application
> servers. Time permitting, I'm interested in any thoughts you can share on
> the following issues.
> 
> 1) There are disk_total, disk_free and part_max_used metrics defined for
> Linux. We're building a storage-heavy cluster, with multiple volumes on each
> node. Are these three metrics sufficient? Can one report a vector metric? I
> suppose we could do that as text. Are there any other considerations you can
> think of for monitoring multiple disks?
> 
> 2) I mentioned this in a previous post, and I appreciate Matt pointing out
> that I can add any metrics I need via gmetric. What guiding principles
> should one use in adding failure metrics?
> Failures are rare, therefore not necessary to monitor them?
> Don't really care about failures themselves, the tip-off is loss of
> application performance?
> Do failures typically show up in metrics that are already present (e.g. if
> the one disk fails)?
> What other thoughts have you had about failure metrics?
> What top three failure metrics would you like to see for your clusters?
> 
> 3) Are you familiar with the NGOP project at FermiLab? If so, do you have
> any quick comments about that project vs. ganglia? You can find the Users
> Guide at http://www-isd.fnal.gov/ngop/. I don't know if it is open source.
> 
> Thanks for your thoughts.
> 
> Jonathan

Hi,

I'm from Fermilab and am the lead system administrator for the CMS
experiment in the Scientific Computing Support group.  We are currently
using ganglia for monitoring several logical clusters.  We do not use
ganglia for an alarm system because it has not really been designed for
that.  We use it for a trending and analysis tool and for a first blush
look at a node that may be experiencing problems.  We use NGOP for
monitoring critical system daemons and general machine availability. 
Our alarm system is all through NGOP.  NGOP is freely available from the


http://fermitools.fnal.gov/

site.  Look under the "Misc. Software Tools" link.  It is not under and
Open Source license but it is under a Fermilab license which you should
read to determine if it suits your need. 

Thanks, 

Joe 

-- 
===================================================================
Joe Kaiser - Systems Administrator                

Fermi Lab 
CD/OSS-SCS                Never laugh at live dragons.
630-840-6444
[EMAIL PROTECTED]                                                
===================================================================


Reply via email to