I realize I should probably be a lot clearer. Our environment/setup is Ganglia 3.1.7 RRDTool version 1.2.23 3 node physical cluster running distribution CentOS release 5.3 (Final) on an x86 64-bit architecture. All 3 nodes on our cluster share the same RAID 10 disks for RAID 1 pairs cciss/c0d0 and cciss/c0d1 I believe.
It would seem the more metrics we add, the more problems we get. Ganglia may be scalable for hundreds of nodes, but perhaps not hundreds of additional custom metrics (all written in C)? After adding dozens of metrics we encounter problems more and more, IE graphs (RRDs) missing, sporadic recording of data or data not being collected at all, and so forth. The head node never seems to have these problems however, just the worker nodes. Thus this may mean it's a networking issue. The built-in default metrics are not exempt from these problems, while they always work fine by themselves, they can be "messed with" by the new metrics, if that's the appropriate term. I wrote a script that reboots Ganglia until ALL the RRDs get created, otherwise gmetad will not always create them all on a single boot (sometimes it will). This seems to help a lot in kickstarting things and encouraging things to work, but I still seem to encounter problems where metrics are not getting collected from the worker nodes even with all the RRDs reporting for duty. Perhaps the traditional gmetric cron job is the thing to use? I mean, this new way of collecting custom metrics may still be unreliable? -----Original Message----- From: Jesse Becker [mailto:[email protected]] Sent: Friday, October 08, 2010 11:22 AM To: Stevens, Weston J Cc: [email protected] Subject: Re: [Ganglia-general] Booting Ganglia becoming a hassle On Fri, Oct 8, 2010 at 10:17, Stevens, Weston J <[email protected]> wrote: > We added a bunch of custom metric modules written in C and it is > sometimes taking 7 or 8 reboots of Ganglia to get it functioning 100% > (all the graphs to show and RRDs created). This does not happen when > we have only the default or just a few modules running on top of the > default, they are all there on the first boot. I was wondering are > there performance limitations I should know about here? Thanks There should not be any limits in this regard. For custom metrics, it will depend on how often they are collected and sent. If you collect a metric one a minute, you should not expect it to appear immediately, especially if you are sending rate data where you need two polling cycles to get the delta. How long are you waiting between restarts, and what is the collection interval? -- Jesse Becker ------------------------------------------------------------------------------ Download new Adobe(R) Flash(R) Builder(TM) 4 The new Adobe(R) Flex(R) 4 and Flash(R) Builder(TM) 4 (formerly Flex(R) Builder(TM)) enable the development of rich applications that run across multiple browsers and platforms. Download your free trials today! http://p.sf.net/sfu/adobe-dev2dev _______________________________________________ Ganglia-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/ganglia-general

