I'll take a look - I tested that case, and the trunk appears to be working on all the MTT runs. I'll have to see if I can replicate it.
On Nov 17, 2011, at 7:42 PM, George Bosilca wrote: > I guess I reach one of these corner-cases that didn't got tested. I can't > start any apps (not even a hostname) after this commit using the rsh PLM (as > soon as I add a hostile). The mpirun is blocked in an infinite loop (after it > spawned the daemons) in orte_rmaps_base_compute_vpids. Attaching with gdb > indicates that cnt is never incremented, thus the mpirun is stuck forever in > the while loop at line 397. > > I used "mpirun -np 2 --bynode ./tp_lb_ub_ng" to start my application, and I > have a machine file containing two nodes: > > node01 slots=8 > node02 slots=8 > > In addition CTRL+C seems to be broken … > > george. > > Begin forwarded message: > >> Author: rhc >> Date: 2011-11-14 22:40:11 EST (Mon, 14 Nov 2011) >> New Revision: 25476 >> URL: https://svn.open-mpi.org/trac/ompi/changeset/25476 >> >> Log: >> At long last, the fabled revision to the affinity system has arrived. A more >> detailed explanation of how this all works will be presented here: >> >> https://svn.open-mpi.org/trac/ompi/wiki/ProcessPlacement >> >> The wiki page is incomplete at the moment, but I hope to complete it over >> the next few days. I will provide updates on the devel list. As the wiki >> page states, the default and most commonly used options remain unchanged >> (except as noted below). New, esoteric and complex options have been added, >> but unless you are a true masochist, you are unlikely to use many of them >> beyond perhaps an initial curiosity-motivated experimentation. >> >> In a nutshell, this commit revamps the map/rank/bind procedure to take into >> account topology info on the compute nodes. I have, for the most part, >> preserved the default behaviors, with three notable exceptions: >> >> 1. I have at long last bowed my head in submission to the system admin's of >> managed clusters. For years, they have complained about our default of >> allowing users to oversubscribe nodes - i.e., to run more processes on a >> node than allocated slots. Accordingly, I have modified the default >> behavior: if you are running off of hostfile/dash-host allocated nodes, then >> the default is to allow oversubscription. If you are running off of >> RM-allocated nodes, then the default is to NOT allow oversubscription. Flags >> to override these behaviors are provided, so this only affects the default >> behavior. >> >> 2. both cpus/rank and stride have been removed. The latter was demanded by >> those who didn't understand the purpose behind it - and I agreed as the >> users who requested it are no longer using it. The former was removed >> temporarily pending implementation. >> >> 3. vm launch is now the sole method for starting OMPI. It was just too >> darned hard to maintain multiple launch procedures - maybe someday, provided >> someone can demonstrate a reason to do so. >> >> As Jeff stated, it is impossible to fully test a change of this size. I have >> tested it on Linux and Mac, covering all the default and simple options, >> singletons, and comm_spawn. That said, I'm sure others will find problems, >> so I'll be watching MTT results until this stabilizes. > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel