Okay - then let's correct the code rather than lose the debug. I'll fix that 
problem.

Thanks
Ralph

On Dec 15, 2013, at 9:54 AM, George Bosilca <bosi...@icl.utk.edu> wrote:

> I understand your reasons but the code as it was in the trunk is not correct. 
> In most of the cases when you reach one of the ompi_rte_db_fetch calls, you 
> are setting up an ompi_proc … which means you own the ompi_proc_lock mutex. 
> As the ompi_rte_db_fetch was calling back into the proc infrastructure to 
> find a proc, it was deadlocking on acquiring the ompi_proc_lock mutex as 
> locking this mutex it is the first thing ompi_proc_find is doing.
> 
> A quick grep indicates that most places where the proc_hostname is used are 
> capable of handling NULL, so avoiding a deadlock in exchange for few hostname 
> replaced by NULL in the output seemed like a acceptable approach to me.
> 
>  George.
> 
> On Dec 15, 2013, at 12:18 , Ralph Castain <r...@open-mpi.org> wrote:
> 
>> This actually creates a bit of a problem. The reason we did this was because 
>> the OMPI-layer "show-help" calls want to report the hostname of the proc. 
>> Since we don't retrieve that info by default, the show-help calls all fail 
>> due to a NULL pointer.
>> 
>> Nathan tried wrapping all the show-help calls with a modex-fetch of 
>> hostname, but that isn't a good solution as the fetch might fail depending 
>> on the problem we are trying to report.
>> 
>> We also noted that the modex recv's current implemented all fetched the 
>> complete RTE-level info whenever any info was requested for that proc. So 
>> the fetch of the hostname was a very low cost operation.
>> 
>> So we decided to always ensure we load the hostname info if it hasn't 
>> already been done, thus keeping the show-help operations functional.
>> 
>> Make sense? Or do you have an alternative method?
>> Ralph
>> 
>> 
>> On Dec 15, 2013, at 8:54 AM, svn-commit-mai...@open-mpi.org wrote:
>> 
>>> Author: bosilca (George Bosilca)
>>> Date: 2013-12-15 11:54:01 EST (Sun, 15 Dec 2013)
>>> New Revision: 29917
>>> URL: https://svn.open-mpi.org/trac/ompi/changeset/29917
>>> 
>>> Log:
>>> Don't be greedy, just do what we asked for.
>>> 
>>> Text files modified: 
>>> trunk/ompi/mca/rte/orte/rte_orte_module.c |    15 ---------------           
>>>               
>>> 1 files changed, 0 insertions(+), 15 deletions(-)
>>> 
>>> Modified: trunk/ompi/mca/rte/orte/rte_orte_module.c
>>> ==============================================================================
>>> --- trunk/ompi/mca/rte/orte/rte_orte_module.c       Sun Dec 15 11:49:27 
>>> 2013        (r29916)
>>> +++ trunk/ompi/mca/rte/orte/rte_orte_module.c       2013-12-15 11:54:01 EST 
>>> (Sun, 15 Dec 2013)      (r29917)
>>> @@ -153,11 +153,6 @@
>>>   if (OPAL_SUCCESS != (rc = opal_db.fetch((opal_identifier_t*)nm, key, 
>>> data, type))) {
>>>       return rc;
>>>   }
>>> -    /* update the hostname */
>>> -    proct = ompi_proc_find(nm);
>>> -    if (NULL == proct->proc_hostname) {
>>> -        opal_db.fetch_pointer((opal_identifier_t*)nm, ORTE_DB_HOSTNAME, 
>>> (void**)&proct->proc_hostname, OPAL_STRING);
>>> -    }
>>>   return OMPI_SUCCESS;
>>> }
>>> 
>>> @@ -171,11 +166,6 @@
>>>   if (OPAL_SUCCESS != (rc = opal_db.fetch_pointer((opal_identifier_t*)nm, 
>>> key, data, type))) {
>>>       return rc;
>>>   }
>>> -    /* update the hostname */
>>> -    proct = ompi_proc_find(nm);
>>> -    if (NULL == proct->proc_hostname) {
>>> -        opal_db.fetch_pointer((opal_identifier_t*)nm, ORTE_DB_HOSTNAME, 
>>> (void**)&proct->proc_hostname, OPAL_STRING);
>>> -    }
>>>   return OMPI_SUCCESS;
>>> }
>>> 
>>> @@ -191,11 +181,6 @@
>>>                                                    OPAL_SCOPE_GLOBAL, key, 
>>> kvs))) {
>>>       return rc;
>>>   }
>>> -    /* update the hostname */
>>> -    proct = ompi_proc_find(nm);
>>> -    if (NULL == proct->proc_hostname) {
>>> -        opal_db.fetch_pointer((opal_identifier_t*)nm, ORTE_DB_HOSTNAME, 
>>> (void**)&proct->proc_hostname, OPAL_STRING);
>>> -    }
>>>   return OMPI_SUCCESS;
>>> }
>>> 
>>> _______________________________________________
>>> svn mailing list
>>> s...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/svn
>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Reply via email to