Hmmm...I'll take a look. It seems to be working for me under Torque and SLURM, though I cannot vouch for the tree launch. The problem with letting the index start at 0 is it breaks other things, so I'll have to see about fixing the routing schemes, or find some compromise.
Thanks for the heads up. Ralph On Wed, Jul 1, 2009 at 1:49 PM, George Bosilca <bosi...@eecs.utk.edu> wrote: > Ralph, > > This commit break several components in OMPI, mainly the routing schemes > and the tree launch. The part with the problem is the reduction of the > number of declared daemons on the second part of the commit, where you > change the boundary for the for loop from 0 to 1. As a result the number of > daemons was decreased by one (I guess in order to exclude the HNP), which is > not something that the routing implementations tolerate. > > Setting the loop boundary back to 0 seems to fix all problems. Please > reconsider your patch. > > george. > > On Fri, 26 Jun 2009, r...@osl.iu.edu wrote: > > Author: rhc >> Date: 2009-06-26 18:07:25 EDT (Fri, 26 Jun 2009) >> New Revision: 21548 >> URL: https://svn.open-mpi.org/trac/ompi/changeset/21548 >> >> Log: >> Cleanup some indexing bugs so that shared memory can function >> >> Text files modified: >> trunk/orte/util/nidmap.c | 12 +++++++----- >> 1 files changed, 7 insertions(+), 5 deletions(-) >> >> Modified: trunk/orte/util/nidmap.c >> >> ============================================================================== >> --- trunk/orte/util/nidmap.c (original) >> +++ trunk/orte/util/nidmap.c 2009-06-26 18:07:25 EDT (Fri, 26 Jun 2009) >> @@ -341,10 +341,10 @@ >> >> /* pack every nodename individually */ >> for (i=1; i < orte_node_pool->size; i++) { >> + if (NULL == (node = >> (orte_node_t*)opal_pointer_array_get_item(orte_node_pool, i))) { >> + continue; >> + } >> if (!orte_keep_fqdn_hostnames) { >> - if (NULL == (node = >> (orte_node_t*)opal_pointer_array_get_item(orte_node_pool, i))) { >> - continue; >> - } >> nodename = strdup(node->name); >> if (NULL != (ptr = strchr(nodename, '.'))) { >> *ptr = '\0'; >> @@ -553,6 +553,8 @@ >> ORTE_ERROR_LOG(rc); >> return rc; >> } >> + /* set the daemon to 0 */ >> + node->daemon = 0; >> >> /* loop over nodes and unpack the raw nodename */ >> for (i=1; i < num_nodes; i++) { >> @@ -570,7 +572,7 @@ >> } >> } >> >> - /* unpack the daemon names */ >> + /* unpack the daemon vpids */ >> vpids = (orte_vpid_t*)malloc(num_nodes * sizeof(orte_vpid_t)); >> n=num_nodes; >> if (ORTE_SUCCESS != (rc = opal_dss.unpack(&buf, vpids, &n, ORTE_VPID))) >> { >> @@ -581,7 +583,7 @@ >> * daemons in the system >> */ >> num_daemons = 0; >> - for (i=0; i < num_nodes; i++) { >> + for (i=1; i < num_nodes; i++) { >> if (NULL != (ndptr = >> (orte_nid_t*)opal_pointer_array_get_item(&orte_nidmap, i))) { >> ndptr->daemon = vpids[i]; >> if (ORTE_VPID_INVALID != vpids[i]) { >> _______________________________________________ >> svn mailing list >> s...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/svn >> >> > "We must accept finite disappointment, but we must never lose infinite > hope." > Martin Luther King > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel >