Hi all, Not sure if this is a OpenMPI query or a PLPA query, but given that PLPA seems to have some support for it already I thought I'd start here. :-)
We run a quad core Opteron cluster with Torque 2.3.x which uses the kernels cpuset support to constrain a job to just the cores it has been allocated. However, we are seeing occasionally that where a job has been allocated multiple cores on the same node we get two compute bound MPI processes in the job scheduled onto the same core (obviously a kernel issue). So CPU affinity would be an obvious solution, but it needs to be done with reference to the cores that are available to it in its cpuset. This information is already retrievable by PLPA (for instance "plpa-taskset -cp $$" will retrieve the cores allocated to the shell you run the command from) but I'm not sure if OpenMPI makes use of this when binding CPUs using the linux paffinity MCA parameter ? Our testing (with 1.3.2) seems to show it doesn't, and I don't think there are any significant differences with the snapshots in 1.4. Am I correct in this ? If so, are there any plans to make it do this ? cheers, Chris -- Christopher Samuel - (03) 9925 4751 - Systems Manager The Victorian Partnership for Advanced Computing P.O. Box 201, Carlton South, VIC 3053, Australia VPAC is a not-for-profit Registered Research Agency