I guess I reach one of these corner-cases that didn't got tested. I can't start 
any apps (not even a hostname) after this commit using the rsh PLM (as soon as 
I add a hostile). The mpirun is blocked in an infinite loop (after it spawned 
the daemons) in orte_rmaps_base_compute_vpids. Attaching with gdb indicates 
that cnt is never incremented, thus the mpirun is stuck forever in the while 
loop at line 397.

I used "mpirun -np 2 --bynode ./tp_lb_ub_ng" to start my application, and I 
have a machine file containing two nodes:

node01 slots=8
node02 slots=8

In addition CTRL+C seems to be broken …

  george.

Begin forwarded message:

> Author: rhc
> Date: 2011-11-14 22:40:11 EST (Mon, 14 Nov 2011)
> New Revision: 25476
> URL: https://svn.open-mpi.org/trac/ompi/changeset/25476
> 
> Log:
> At long last, the fabled revision to the affinity system has arrived. A more 
> detailed explanation of how this all works will be presented here:
> 
> https://svn.open-mpi.org/trac/ompi/wiki/ProcessPlacement
> 
> The wiki page is incomplete at the moment, but I hope to complete it over the 
> next few days. I will provide updates on the devel list. As the wiki page 
> states, the default and most commonly used options remain unchanged (except 
> as noted below). New, esoteric and complex options have been added, but 
> unless you are a true masochist, you are unlikely to use many of them beyond 
> perhaps an initial curiosity-motivated experimentation.
> 
> In a nutshell, this commit revamps the map/rank/bind procedure to take into 
> account topology info on the compute nodes. I have, for the most part, 
> preserved the default behaviors, with three notable exceptions:
> 
> 1. I have at long last bowed my head in submission to the system admin's of 
> managed clusters. For years, they have complained about our default of 
> allowing users to oversubscribe nodes - i.e., to run more processes on a node 
> than allocated slots. Accordingly, I have modified the default behavior: if 
> you are running off of hostfile/dash-host allocated nodes, then the default 
> is to allow oversubscription. If you are running off of RM-allocated nodes, 
> then the default is to NOT allow oversubscription. Flags to override these 
> behaviors are provided, so this only affects the default behavior.
> 
> 2. both cpus/rank and stride have been removed. The latter was demanded by 
> those who didn't understand the purpose behind it - and I agreed as the users 
> who requested it are no longer using it. The former was removed temporarily 
> pending implementation.
> 
> 3. vm launch is now the sole method for starting OMPI. It was just too darned 
> hard to maintain multiple launch procedures - maybe someday, provided someone 
> can demonstrate a reason to do so.
> 
> As Jeff stated, it is impossible to fully test a change of this size. I have 
> tested it on Linux and Mac, covering all the default and simple options, 
> singletons, and comm_spawn. That said, I'm sure others will find problems, so 
> I'll be watching MTT results until this stabilizes.


Reply via email to