I understand your reasons but the code as it was in the trunk is not correct. In most of the cases when you reach one of the ompi_rte_db_fetch calls, you are setting up an ompi_proc … which means you own the ompi_proc_lock mutex. As the ompi_rte_db_fetch was calling back into the proc infrastructure to find a proc, it was deadlocking on acquiring the ompi_proc_lock mutex as locking this mutex it is the first thing ompi_proc_find is doing.
A quick grep indicates that most places where the proc_hostname is used are capable of handling NULL, so avoiding a deadlock in exchange for few hostname replaced by NULL in the output seemed like a acceptable approach to me. George. On Dec 15, 2013, at 12:18 , Ralph Castain <r...@open-mpi.org> wrote: > This actually creates a bit of a problem. The reason we did this was because > the OMPI-layer "show-help" calls want to report the hostname of the proc. > Since we don't retrieve that info by default, the show-help calls all fail > due to a NULL pointer. > > Nathan tried wrapping all the show-help calls with a modex-fetch of hostname, > but that isn't a good solution as the fetch might fail depending on the > problem we are trying to report. > > We also noted that the modex recv's current implemented all fetched the > complete RTE-level info whenever any info was requested for that proc. So the > fetch of the hostname was a very low cost operation. > > So we decided to always ensure we load the hostname info if it hasn't already > been done, thus keeping the show-help operations functional. > > Make sense? Or do you have an alternative method? > Ralph > > > On Dec 15, 2013, at 8:54 AM, svn-commit-mai...@open-mpi.org wrote: > >> Author: bosilca (George Bosilca) >> Date: 2013-12-15 11:54:01 EST (Sun, 15 Dec 2013) >> New Revision: 29917 >> URL: https://svn.open-mpi.org/trac/ompi/changeset/29917 >> >> Log: >> Don't be greedy, just do what we asked for. >> >> Text files modified: >> trunk/ompi/mca/rte/orte/rte_orte_module.c | 15 --------------- >> >> 1 files changed, 0 insertions(+), 15 deletions(-) >> >> Modified: trunk/ompi/mca/rte/orte/rte_orte_module.c >> ============================================================================== >> --- trunk/ompi/mca/rte/orte/rte_orte_module.c Sun Dec 15 11:49:27 >> 2013 (r29916) >> +++ trunk/ompi/mca/rte/orte/rte_orte_module.c 2013-12-15 11:54:01 EST >> (Sun, 15 Dec 2013) (r29917) >> @@ -153,11 +153,6 @@ >> if (OPAL_SUCCESS != (rc = opal_db.fetch((opal_identifier_t*)nm, key, >> data, type))) { >> return rc; >> } >> - /* update the hostname */ >> - proct = ompi_proc_find(nm); >> - if (NULL == proct->proc_hostname) { >> - opal_db.fetch_pointer((opal_identifier_t*)nm, ORTE_DB_HOSTNAME, >> (void**)&proct->proc_hostname, OPAL_STRING); >> - } >> return OMPI_SUCCESS; >> } >> >> @@ -171,11 +166,6 @@ >> if (OPAL_SUCCESS != (rc = opal_db.fetch_pointer((opal_identifier_t*)nm, >> key, data, type))) { >> return rc; >> } >> - /* update the hostname */ >> - proct = ompi_proc_find(nm); >> - if (NULL == proct->proc_hostname) { >> - opal_db.fetch_pointer((opal_identifier_t*)nm, ORTE_DB_HOSTNAME, >> (void**)&proct->proc_hostname, OPAL_STRING); >> - } >> return OMPI_SUCCESS; >> } >> >> @@ -191,11 +181,6 @@ >> OPAL_SCOPE_GLOBAL, key, >> kvs))) { >> return rc; >> } >> - /* update the hostname */ >> - proct = ompi_proc_find(nm); >> - if (NULL == proct->proc_hostname) { >> - opal_db.fetch_pointer((opal_identifier_t*)nm, ORTE_DB_HOSTNAME, >> (void**)&proct->proc_hostname, OPAL_STRING); >> - } >> return OMPI_SUCCESS; >> } >> >> _______________________________________________ >> svn mailing list >> s...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/svn > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel