I added two improvements. Please replace the previous patch file by this attached one, and take a look this week end.
1. Add pre-check for ORTE_ERR_NOT_FOUND to make retry with byslot work afterward correctly. Otherwise, the retry could fail, because some fields such as node->procs, node->slots_inuse is already updated. 2. Improve the detection of oversubscription, when node->slots is not multiple number of cpus_per_rank. For example, using node05, node06 with slots = 8 and setting cpus_per_rank = 3, np = 5 should be oversubscribed, although np x cpus_per_rank(3X5=15) is less than num_slots(=16). I fixed to detect this oversubscription. Tetsuya (See attached file: patch.byobj2) > Hi Tetsuya > > Let me take a look when I get home this weekend - I'm giving an ORTE tutorial to a group of new developers this week and my time is very limited. > > Thanks > Ralph > > > > On Tue, Mar 25, 2014 at 5:37 PM, <tmish...@jcity.maeda.co.jp>wrote: > > Hi Ralph, I moved on to the development list. > > I'm not sure why add_one flag is used in the rr_byobj. > Here, if oversubscribed, proc is mapped to each object > one by one. So, I think the add_one is not necesarry. > > Instead, when the user doesn't permit oversubscription, > the second pass should be skipped. > > I made the logic a bit clear based upon this idea and > removed some outputs to synchronize it with the 1.7 branch. > > Please take a look at attached patch file. > > Tetsuya > > (See attached file: patch.byobj) > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: http://www.open-mpi.org/community/lists/devel/2014/03/14393.php_______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/develLink to this post: http://www.open-mpi.org/community/lists/devel/2014/03/14394.php
patch.byobj2
Description: Binary data