The current code is correct. If the goal is to continue to retrieve the proc_name in a call that is asking for something else, I don’t see how you can remove this circular dependency between setting and using the ompi_procs.
George. On Dec 15, 2013, at 13:52 , Ralph Castain <r...@open-mpi.org> wrote: > Okay - then let's correct the code rather than lose the debug. I'll fix that > problem. > > Thanks > Ralph > > On Dec 15, 2013, at 9:54 AM, George Bosilca <bosi...@icl.utk.edu> wrote: > >> I understand your reasons but the code as it was in the trunk is not >> correct. In most of the cases when you reach one of the ompi_rte_db_fetch >> calls, you are setting up an ompi_proc … which means you own the >> ompi_proc_lock mutex. As the ompi_rte_db_fetch was calling back into the >> proc infrastructure to find a proc, it was deadlocking on acquiring the >> ompi_proc_lock mutex as locking this mutex it is the first thing >> ompi_proc_find is doing. >> >> A quick grep indicates that most places where the proc_hostname is used are >> capable of handling NULL, so avoiding a deadlock in exchange for few >> hostname replaced by NULL in the output seemed like a acceptable approach to >> me. >> >> George. >> >> On Dec 15, 2013, at 12:18 , Ralph Castain <r...@open-mpi.org> wrote: >> >>> This actually creates a bit of a problem. The reason we did this was >>> because the OMPI-layer "show-help" calls want to report the hostname of the >>> proc. Since we don't retrieve that info by default, the show-help calls all >>> fail due to a NULL pointer. >>> >>> Nathan tried wrapping all the show-help calls with a modex-fetch of >>> hostname, but that isn't a good solution as the fetch might fail depending >>> on the problem we are trying to report. >>> >>> We also noted that the modex recv's current implemented all fetched the >>> complete RTE-level info whenever any info was requested for that proc. So >>> the fetch of the hostname was a very low cost operation. >>> >>> So we decided to always ensure we load the hostname info if it hasn't >>> already been done, thus keeping the show-help operations functional. >>> >>> Make sense? Or do you have an alternative method? >>> Ralph >>> >>> >>> On Dec 15, 2013, at 8:54 AM, svn-commit-mai...@open-mpi.org wrote: >>> >>>> Author: bosilca (George Bosilca) >>>> Date: 2013-12-15 11:54:01 EST (Sun, 15 Dec 2013) >>>> New Revision: 29917 >>>> URL: https://svn.open-mpi.org/trac/ompi/changeset/29917 >>>> >>>> Log: >>>> Don't be greedy, just do what we asked for. >>>> >>>> Text files modified: >>>> trunk/ompi/mca/rte/orte/rte_orte_module.c | 15 --------------- >>>> >>>> 1 files changed, 0 insertions(+), 15 deletions(-) >>>> >>>> Modified: trunk/ompi/mca/rte/orte/rte_orte_module.c >>>> ============================================================================== >>>> --- trunk/ompi/mca/rte/orte/rte_orte_module.c Sun Dec 15 11:49:27 >>>> 2013 (r29916) >>>> +++ trunk/ompi/mca/rte/orte/rte_orte_module.c 2013-12-15 11:54:01 EST >>>> (Sun, 15 Dec 2013) (r29917) >>>> @@ -153,11 +153,6 @@ >>>> if (OPAL_SUCCESS != (rc = opal_db.fetch((opal_identifier_t*)nm, key, >>>> data, type))) { >>>> return rc; >>>> } >>>> - /* update the hostname */ >>>> - proct = ompi_proc_find(nm); >>>> - if (NULL == proct->proc_hostname) { >>>> - opal_db.fetch_pointer((opal_identifier_t*)nm, ORTE_DB_HOSTNAME, >>>> (void**)&proct->proc_hostname, OPAL_STRING); >>>> - } >>>> return OMPI_SUCCESS; >>>> } >>>> >>>> @@ -171,11 +166,6 @@ >>>> if (OPAL_SUCCESS != (rc = opal_db.fetch_pointer((opal_identifier_t*)nm, >>>> key, data, type))) { >>>> return rc; >>>> } >>>> - /* update the hostname */ >>>> - proct = ompi_proc_find(nm); >>>> - if (NULL == proct->proc_hostname) { >>>> - opal_db.fetch_pointer((opal_identifier_t*)nm, ORTE_DB_HOSTNAME, >>>> (void**)&proct->proc_hostname, OPAL_STRING); >>>> - } >>>> return OMPI_SUCCESS; >>>> } >>>> >>>> @@ -191,11 +181,6 @@ >>>> OPAL_SCOPE_GLOBAL, key, >>>> kvs))) { >>>> return rc; >>>> } >>>> - /* update the hostname */ >>>> - proct = ompi_proc_find(nm); >>>> - if (NULL == proct->proc_hostname) { >>>> - opal_db.fetch_pointer((opal_identifier_t*)nm, ORTE_DB_HOSTNAME, >>>> (void**)&proct->proc_hostname, OPAL_STRING); >>>> - } >>>> return OMPI_SUCCESS; >>>> } >>>> >>>> _______________________________________________ >>>> svn mailing list >>>> s...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/svn >>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel