i just checked into CVS what i believe could be a fix to the CPU hogging
bug that you've seen in gmond when the cluster is rebooted.  i thought it 
would be easier for you to just use the latest CVS source than to try all 
the suggestions i gave earlier (of course you can still do them if you 
like ;) )

the main changes to the code are in ./monitor-core/gmond/gmond.c .. take a 
look at the new send_all_metric_data() function.  it is a little more 
complex but A LOT less prone to cause a UDP storm when you have all gmonds 
start at the same time.

sorry for the problems you've been experiencing... 

.. i'm getting ready to go camping until tomorrow morning.. so if you 
don't hear back from me it because i'm looking for orion in the sky and 
drinking a cold one.

bottoms up
-matt

Today, Steven A. DuChene wrote forth saying...

> On Fri, Oct 11, 2002 at 12:30:21AM -0700, matt massie wrote:
> > Today, Brian Messenger wrote forth saying...
> > 
> > > Yes, a friend asked about this,  another person replied with this:
> > > 
> > > "There have been complaints about Ganglia occasionally going into heavy
> > > CPU use and it is a known bug.  I am not sure if it has been fixed.  I
> > > saw some stuff on it a few weeks ago."
> > 
> > the person who was having the problem was running version 2.2.3.  i've
> > only seen one complaint about it so if anyone else on this list has
> > problems which gmond sucking up a lot of CPU please let me know.
> 
> It was not just one person. Our oscar core team seems to be able to reproduce
> this problem on a frequent basis. See the bug report at:
> 
> https://sourceforge.net/tracker/index.php?func=detail&aid=602940&group_id=9368&atid=109368
> 
> > i suspect the problem was in the way heartbeats were handled in pre 2.3.0
> > but i haven't formally debugged the problem.  he found that restarting 
> > gmond fixed the problem.. a better fix is to run 2.5.0 :).
> > 
> > -matt 
> 
> There is a fairly large user base of OSCAR installs using 2.2.3 so it is not
> trivial to go around to all of those and replace the version of ganglia.
> 



Reply via email to