Hi All,
I'm new to the ganglia/gexec community and am interested in a few
basics to start:
I have set up a 16-node 2-CPU cluster for ganglia/gexec testing,
running SuSE 9.1 w/ the 2.6.4-52-bigsmp kernel. So far all seems to be
running fine and I get the expected results.
First, is there a way that one can characterize the hosts so that
gexec/gmond see them as multiple systems? In other words, when I try to
submit gexec -n 17 hostname I get "Not enough hosts available",
although there are 32 CPUs available. My applications require fairly
loaded (in the memory sense) servers, so I tend to use each CPU as a
separate system.
Also, as the CPUs in this cluster are hyperthreaded, the hosts are
reported as 4-CPU machines...
Second, what is the mechanism that gmond uses to sense load on each
system, without pawing through the source? I need to set up nearly
instantaneous load reporting, a la vmstat, in order to properly assign
jobs to candidate machines, without getting SGE-style host pileup
effects ;-)
As a test, I submitted 4 large jobs via gexec (as gexec -n 1 jobname)
in not-so-rapid succession, and they ended up all on the the same host,
so I'm assuming there is some lag in gmond reporting the least-loaded
target host. Any ideas in improving this?
All in all, this is a great project and I look forward to participating
in the future.
Regards,
Mike