Hello, I would like to use gexec on our 3000+ node cluster. But I have noticed that if within the GEXEC_SVRS list one machine is down or gexecd is not running, the entire process fails to return a result for any of the machines. Is this the correct behavior?
I have been trying to modify the gexec code to be resistant to downed machines, but it has proved to be quite complicated. Would anyone have any insight on this issue? IMHO, it should be possible to make sure that the machines in GEXEC_SVRS are running and if not then have them removed from the gexec execution, but so far I have not found and easy/efficient way to do this. Any help would be much appreciated, Ali ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Ganglia-developers mailing list Ganglia-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-developers