Hi All,

I'm new to the ganglia/gexec community and am interested in a few basics to start:

I have set up a 16-node 2-CPU cluster for ganglia/gexec testing, running SuSE 9.1 w/ the 2.6.4-52-bigsmp kernel. So far all seems to be running fine and I get the expected results.

First, is there a way that one can characterize the hosts so that gexec/gmond see them as multiple systems? In other words, when I try to submit gexec -n 17 hostname I get "Not enough hosts available", although there are 32 CPUs available. My applications require fairly loaded (in the memory sense) servers, so I tend to use each CPU as a separate system.

Also, as the CPUs in this cluster are hyperthreaded, the hosts are reported as 4-CPU machines...

Second, what is the mechanism that gmond uses to sense load on each system, without pawing through the source? I need to set up nearly instantaneous load reporting, a la vmstat, in order to properly assign jobs to candidate machines, without getting SGE-style host pileup effects ;-)

As a test, I submitted 4 large jobs via gexec (as gexec -n 1 jobname) in not-so-rapid succession, and they ended up all on the the same host, so I'm assuming there is some lag in gmond reporting the least-loaded target host. Any ideas in improving this?

All in all, this is a great project and I look forward to participating in the future.

Regards,

Mike


Reply via email to