On Aug 17 2009, Ralph Castain wrote:
The problem is that the two mpiruns don't know about each other, and
therefore the second mpirun doesn't know that another mpirun has
already used socket 0.
We hope to change that at some point in the future.
It won't help. The problem is less likely to be that two jobs are running
OpenMPI programs (that have been recently linked!), but that the other tasks
are not OpenMPI at all. I have mentioned daemons, kernel threads and so on,
but think of shared-memory parallel programs (OpenMP etc.) and so on; a LOT
of applications nowadays include some sort of threading.
For the ordinary multi-user system, you don't want any form of binding. The
scheduler is ricketty enough as it is, without confusing it further. That
may change as the consequences of serious levels of multiple cores force
that area to be improved, but don't hold your breath. And I haven't a clue
which of the many directions scheduler design will go!
I agree that having an option, and having it easy to experiment with, is the
right way to go. What the default should be is very much less clear.
Regards,
Nick Maclaren.