Hi.

I am new to Ganglia. I have a problem and will be much obliged if
someone gave me a hint on how to deal with it.

I installed Ganglia 3.0.5 from an RPM for SUSE 10 SP2 that I downloaded
from one of the RPM repositories. It is running on a small cluster of 2
nodes, 16 CPUs total. I did not modify the default setup beyond what was
absolutely necessary to get everything working. I see what looks like an
interference of one of Ganglia components with MPICH.

Here are my observations:

1)     At some point my MPICH application was running fine, but Ganglia
was not showing information from node 2. It turned out gmond was not
running. I started it. After a very short while my MPICH application
hung. 

2)     I restarted the application. It ran for a while and then it hung
again. I killed gmond and gmetad on both nodes and the application
immediately resumed and continued running. It looked as if MPI messages
got held up.

 

Has anyone seen anything like this or knows of a mechanism by which this
behavior could be triggered?

Thanks.

Misha Sushchik

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Ganglia-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Reply via email to