>>> On 11/7/2009 at 12:06 AM, in message
<20091107070643.ga20...@porcupine.cita.utoronto.ca>, Robin Humble
<robin.humble+gang...@anu.edu.au> wrote:
> Hi,
> 
> I spoof a bunch of temperature and power metrics via ILOM for a few
> hundred nodes and I noticed that gmetad wasn't making a summary table
> (.../__SummaryInfo__/*) for most of the spoof'd values.
> 
> turns out that there's a SPOOF_HOST EXTRA_ELEMENT attached to each
> spoof'd metric, and when 100's of hosts (>40 or so should trigger it)
> have spoof'd entries, then those add up and then corrupt the summary
> Metric structure enough to destroy the .type and stop the rrd being
> generated.
> I'm guessing it's the same as the MAX_EXTRA_ELEMENTS problem, except
> for the summary table instead of the host table.
> 
> attached is a simplistic patch that fixes the problem.
> it could probably be done better, but works for me. it's against 3.1.2,
> but should apply to 3.1.4 as well.
> 
> apologies if I have some of the ganglia/gmetad terminology wrong - I've
> been using it for years, but this my first dive into the code.
> 

I took a look at this patch and since I am not able to reproduce the problem, 
it makes it a little unclear as to what is happening.  I can't really figure 
out how this patch fixes a problem with the hash table.  According to the 
source code, whenever an extra element is parsed, the code inserts the extra 
element into a list of extra data on a per metric basis.  This means that only 
one extra element for a spoof host is ever stored for a metric.  Then when the 
code moves into the summary data portion, it specifically checks to make sure 
that it is not duplicating an extra element value before it inserts it into the 
summary node (check the for loop at around line #827 in the 3.1.2 version of 
the source code).  If it detects a duplicate value, then it skips the insert 
and just updates the rest of the summary node in the hash table.  Since I am 
not able to duplicate the problem, could you step further through the original 
source code to make sure that the check for a duplicate value is actually 
happening and that the code is not taking some other path that could be causing 
the problem.

You might also want to check in the source code at the point where the summary 
table is actually written to see if there is some clue there why your summary 
rrd files are not being created or updated.

Brad



------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Reply via email to