On Tue, Sep 02, 2008 at 12:54:07PM -0400, Ofer Inbar wrote:
> Brad Nicholes <[EMAIL PROTECTED]> wrote:
> > Thanks Carlo, this is some good feedback.  I know that both Bernard
> > and Cos have reported having issues with this bug.  Could either (or
> > both) of you independently confirm that this patch fixes the problem?
> 
> To reproduce this bug, I'd need a host in a state where it accepts TCP
> connections but then leaves them hung, which is not something I want
> to do on any of my production hosts,

it shouldn't be a problem at all if your failover sources are setup correctly
anyway.

you don't need to crash the machine, but just stop the gmond process by
running something like :

  # kill -STOP `pidof gmond`

to fix it after you are done you can do :

  # kill -CONT `pidof gmond`

you will need a patched gmetad though, but doesn't need to be the same you
have in production either, even if I'd expect you to roll it there quickly if
this problem is really a showstopper for your 3.1 production deployment as
Brad seemed to think.

> If anyone out there on the list has a way to set up a Ganglia
> testing cluster and then deliberately put one of the data sources in
> his state, wanna test out this patch?

that is what I did, but I have to admit that my test environment was tiny as I
only used 1 linux box (my gentoo linux workstation) and 1 windows box (a
windows vista box where I build my windows ganglia binaries) configured
together in one single cluster running 3.1 (the failover source wasn't setup
correctly though as I don't have a way to synchronize the clocks between them
both, and they are in different VLANs and my little linksys switch can't do
multicast routing)

Brad is probable looking for someone else to come out with a more realistic
production like test, but if no one can do that, I might be able to configure
it by moving around some cables and trying to setup a more realistic failover
scenario (running linux in the windows box) even if that probably defeats the
"indepent confirmation" part of the testing request.

Carlo

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Ganglia-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Reply via email to