Hello,

        I would like to use gexec on our 3000+ node cluster. But I have noticed 
that if within the GEXEC_SVRS list one machine is down or gexecd is not 
running, the entire process fails to return a result for any of the 
machines. Is this the correct behavior?

I have been trying to modify the gexec code to be resistant to downed 
machines, but it has proved to be quite complicated. Would anyone have 
any insight on this issue?

IMHO, it should be possible to make sure that the machines in GEXEC_SVRS 
are running and if not then have them removed from the gexec execution, 
but so far I have not found and easy/efficient way to do this.

Any help would be much appreciated,

Ali

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Ganglia-developers mailing list
Ganglia-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Reply via email to