Once you get into a large number of RRD files, whether that's by having
thousands of metrics per node or thousands of nodes with a few metrics, you
need to deal with RRD's random access I/O.
One way to deal with that is to deploy your RRD files on a RAM disk, and use
scripts to sync those to persistent store. Another is to deploy RRDcache.
Check the archives of this mailing list (or the ganglia wiki) for advice on
how to set these up; I think there are also some sync scripts there, too.
What "large" means is dependent on your I/O system, of course.
It's certainly *possible* that the problem is in the collection of the
metrics, but the most common fail is in updating the RRDfiles. I'd fix that
first before looking at gmond, gmetad, or your python modules.
-- ReC
On Fri, Oct 15, 2010 at 9:18 AM, Stevens, Weston J <
[email protected]> wrote:
> I realize I should probably be a lot clearer.
>
> Our environment/setup is Ganglia 3.1.7 RRDTool version 1.2.23 3 node
> physical cluster running distribution CentOS release 5.3 (Final) on an x86
> 64-bit architecture. All 3 nodes on our cluster share the same RAID 10 disks
> for RAID 1 pairs cciss/c0d0 and cciss/c0d1 I believe.
>
> It would seem the more metrics we add, the more problems we get. Ganglia
> may be scalable for hundreds of nodes, but perhaps not hundreds of
> additional custom metrics (all written in C)?
>
> After adding dozens of metrics we encounter problems more and more, IE
> graphs (RRDs) missing, sporadic recording of data or data not being
> collected at all, and so forth. The head node never seems to have these
> problems however, just the worker nodes. Thus this may mean it's a
> networking issue. The built-in default metrics are not exempt from these
> problems, while they always work fine by themselves, they can be "messed
> with" by the new metrics, if that's the appropriate term.
>
> I wrote a script that reboots Ganglia until ALL the RRDs get created,
> otherwise gmetad will not always create them all on a single boot (sometimes
> it will). This seems to help a lot in kickstarting things and encouraging
> things to work, but I still seem to encounter problems where metrics are not
> getting collected from the worker nodes even with all the RRDs reporting for
> duty.
>
> Perhaps the traditional gmetric cron job is the thing to use? I mean, this
> new way of collecting custom metrics may still be unreliable?
>
>
> -----Original Message-----
> From: Jesse Becker [mailto:[email protected]]
> Sent: Friday, October 08, 2010 11:22 AM
> To: Stevens, Weston J
> Cc: [email protected]
> Subject: Re: [Ganglia-general] Booting Ganglia becoming a hassle
>
> On Fri, Oct 8, 2010 at 10:17, Stevens, Weston J <
> [email protected]> wrote:
> > We added a bunch of custom metric modules written in C and it is
> > sometimes taking 7 or 8 reboots of Ganglia to get it functioning 100%
> > (all the graphs to show and RRDs created). This does not happen when
> > we have only the default or just a few modules running on top of the
> > default, they are all there on the first boot. I was wondering are
> > there performance limitations I should know about here? Thanks
>
> There should not be any limits in this regard.
>
> For custom metrics, it will depend on how often they are collected and
> sent. If you collect a metric one a minute, you should not expect it to
> appear immediately, especially if you are sending rate data where you need
> two polling cycles to get the delta.
>
> How long are you waiting between restarts, and what is the collection
> interval?
>
>
> --
> Jesse Becker
>
>
> ------------------------------------------------------------------------------
> Download new Adobe(R) Flash(R) Builder(TM) 4
> The new Adobe(R) Flex(R) 4 and Flash(R) Builder(TM) 4 (formerly
> Flex(R) Builder(TM)) enable the development of rich applications that run
> across multiple browsers and platforms. Download your free trials today!
> http://p.sf.net/sfu/adobe-dev2dev
> _______________________________________________
> Ganglia-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/ganglia-general
>
------------------------------------------------------------------------------
Download new Adobe(R) Flash(R) Builder(TM) 4
The new Adobe(R) Flex(R) 4 and Flash(R) Builder(TM) 4 (formerly
Flex(R) Builder(TM)) enable the development of rich applications that run
across multiple browsers and platforms. Download your free trials today!
http://p.sf.net/sfu/adobe-dev2dev
_______________________________________________
Ganglia-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-general