Hmmm...I'll take a look. It seems to be working for me under Torque and
SLURM, though I cannot vouch for the tree launch. The problem with letting
the index start at 0 is it breaks other things, so I'll have to see about
fixing the routing schemes, or find some compromise.

Thanks for the heads up.
Ralph


On Wed, Jul 1, 2009 at 1:49 PM, George Bosilca <bosi...@eecs.utk.edu> wrote:

> Ralph,
>
> This commit break several components in OMPI, mainly the routing schemes
> and the tree launch. The part with the problem is the reduction of the
> number of declared daemons on the second part of the commit, where you
> change the boundary for the for loop from 0 to 1. As a result the number of
> daemons was decreased by one (I guess in order to exclude the HNP), which is
> not something that the routing implementations tolerate.
>
> Setting the loop boundary back to 0 seems to fix all problems. Please
> reconsider your patch.
>
>  george.
>
> On Fri, 26 Jun 2009, r...@osl.iu.edu wrote:
>
>  Author: rhc
>> Date: 2009-06-26 18:07:25 EDT (Fri, 26 Jun 2009)
>> New Revision: 21548
>> URL: https://svn.open-mpi.org/trac/ompi/changeset/21548
>>
>> Log:
>> Cleanup some indexing bugs so that shared memory can function
>>
>> Text files modified:
>>  trunk/orte/util/nidmap.c |    12 +++++++-----
>>  1 files changed, 7 insertions(+), 5 deletions(-)
>>
>> Modified: trunk/orte/util/nidmap.c
>>
>> ==============================================================================
>> --- trunk/orte/util/nidmap.c    (original)
>> +++ trunk/orte/util/nidmap.c    2009-06-26 18:07:25 EDT (Fri, 26 Jun 2009)
>> @@ -341,10 +341,10 @@
>>
>>    /* pack every nodename individually */
>>    for (i=1; i < orte_node_pool->size; i++) {
>> +        if (NULL == (node =
>> (orte_node_t*)opal_pointer_array_get_item(orte_node_pool, i))) {
>> +            continue;
>> +        }
>>        if (!orte_keep_fqdn_hostnames) {
>> -            if (NULL == (node =
>> (orte_node_t*)opal_pointer_array_get_item(orte_node_pool, i))) {
>> -                continue;
>> -            }
>>            nodename = strdup(node->name);
>>            if (NULL != (ptr = strchr(nodename, '.'))) {
>>                *ptr = '\0';
>> @@ -553,6 +553,8 @@
>>        ORTE_ERROR_LOG(rc);
>>        return rc;
>>    }
>> +    /* set the daemon to 0 */
>> +    node->daemon = 0;
>>
>>    /* loop over nodes and unpack the raw nodename */
>>    for (i=1; i < num_nodes; i++) {
>> @@ -570,7 +572,7 @@
>>        }
>>    }
>>
>> -    /* unpack the daemon names */
>> +    /* unpack the daemon vpids */
>>    vpids = (orte_vpid_t*)malloc(num_nodes * sizeof(orte_vpid_t));
>>    n=num_nodes;
>>    if (ORTE_SUCCESS != (rc = opal_dss.unpack(&buf, vpids, &n, ORTE_VPID)))
>> {
>> @@ -581,7 +583,7 @@
>>     * daemons in the system
>>     */
>>    num_daemons = 0;
>> -    for (i=0; i < num_nodes; i++) {
>> +    for (i=1; i < num_nodes; i++) {
>>        if (NULL != (ndptr =
>> (orte_nid_t*)opal_pointer_array_get_item(&orte_nidmap, i))) {
>>            ndptr->daemon = vpids[i];
>>            if (ORTE_VPID_INVALID != vpids[i]) {
>> _______________________________________________
>> svn mailing list
>> s...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/svn
>>
>>
> "We must accept finite disappointment, but we must never lose infinite
> hope."
>                                  Martin Luther King
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>

Reply via email to