Maybe this is related to Reuti's "-hostfile ignored in 1.6.1" on the
users mail list, but not quite sure.
Let's pretend my nodes are called local, r1, and r2. That is, I launch
mpirun from "local" and there are two other (remote) nodes available to
me. With the trunk (e.g., v1.9 r27247), I get
% mpirun --bynode --nooversubscribe --host r1,r1,r1,r2,r2,r2 -n 6
--tag-output hostname
[1,0]<stdout>:r1
[1,1]<stdout>:r2
[1,2]<stdout>:r1
[1,3]<stdout>:r2
[1,4]<stdout>:r1
[1,5]<stdout>:r2
which seems right to me. But when the local node is involved:
% mpirun --bynode --nooversubscribe --host
local,local,local,r1,r1,r1 -np 4 --tag-output hostname
[1,0]<stdout>:local
[1,1]<stdout>:r1
[1,2]<stdout>:r1
[1,3]<stdout>:r1
% mpirun --bynode --nooversubscribe --host
local,local,local,r1,r1,r1 -np 5 --tag-output hostname
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 5
slots
that were requested by the application:
hostname
Either request fewer slots for your application, or make more slots
available
for use.
--------------------------------------------------------------------------
I'm not seeing all the local slots I should be seeing. We're seeing
wide-scale MTT trunk failures due to this problem.
There is a similar loss of local slots with hostfile syntax. E.g.,
% hostname
local
% cat hostfile
local
r1
% mpirun --hostfile hostfile -n 2 hostname
--------------------------------------------------------------------------
A hostfile was provided that contains at least one node not
present in the allocation:
hostfile: hostfile
node: local
If you are operating in a resource-managed environment, then only
nodes that are in the allocation can be used in the hostfile. You
may find relative node syntax to be a useful alternative to
specifying absolute node names see the orte_hosts man page for
further information.
--------------------------------------------------------------------------
The problem is solved with "--mca orte_default_hostname hostfile".