Sure you can - just find the ompi_proc_t without setting the thread lock since (as you point out) it is already being held. Or we could pass the ompi_proc_t into the call, if necessary. Either way should work.
On Dec 15, 2013, at 11:30 AM, George Bosilca <bosi...@icl.utk.edu> wrote: > The current code is correct. If the goal is to continue to retrieve the > proc_name in a call that is asking for something else, I don’t see how you > can remove this circular dependency between setting and using the ompi_procs. > > George. > > On Dec 15, 2013, at 13:52 , Ralph Castain <r...@open-mpi.org> wrote: > >> Okay - then let's correct the code rather than lose the debug. I'll fix that >> problem. >> >> Thanks >> Ralph >> >> On Dec 15, 2013, at 9:54 AM, George Bosilca <bosi...@icl.utk.edu> wrote: >> >>> I understand your reasons but the code as it was in the trunk is not >>> correct. In most of the cases when you reach one of the ompi_rte_db_fetch >>> calls, you are setting up an ompi_proc … which means you own the >>> ompi_proc_lock mutex. As the ompi_rte_db_fetch was calling back into the >>> proc infrastructure to find a proc, it was deadlocking on acquiring the >>> ompi_proc_lock mutex as locking this mutex it is the first thing >>> ompi_proc_find is doing. >>> >>> A quick grep indicates that most places where the proc_hostname is used are >>> capable of handling NULL, so avoiding a deadlock in exchange for few >>> hostname replaced by NULL in the output seemed like a acceptable approach >>> to me. >>> >>> George. >>> >>> On Dec 15, 2013, at 12:18 , Ralph Castain <r...@open-mpi.org> wrote: >>> >>>> This actually creates a bit of a problem. The reason we did this was >>>> because the OMPI-layer "show-help" calls want to report the hostname of >>>> the proc. Since we don't retrieve that info by default, the show-help >>>> calls all fail due to a NULL pointer. >>>> >>>> Nathan tried wrapping all the show-help calls with a modex-fetch of >>>> hostname, but that isn't a good solution as the fetch might fail depending >>>> on the problem we are trying to report. >>>> >>>> We also noted that the modex recv's current implemented all fetched the >>>> complete RTE-level info whenever any info was requested for that proc. So >>>> the fetch of the hostname was a very low cost operation. >>>> >>>> So we decided to always ensure we load the hostname info if it hasn't >>>> already been done, thus keeping the show-help operations functional. >>>> >>>> Make sense? Or do you have an alternative method? >>>> Ralph >>>> >>>> >>>> On Dec 15, 2013, at 8:54 AM, svn-commit-mai...@open-mpi.org wrote: >>>> >>>>> Author: bosilca (George Bosilca) >>>>> Date: 2013-12-15 11:54:01 EST (Sun, 15 Dec 2013) >>>>> New Revision: 29917 >>>>> URL: https://svn.open-mpi.org/trac/ompi/changeset/29917 >>>>> >>>>> Log: >>>>> Don't be greedy, just do what we asked for. >>>>> >>>>> Text files modified: >>>>> trunk/ompi/mca/rte/orte/rte_orte_module.c | 15 --------------- >>>>> >>>>> 1 files changed, 0 insertions(+), 15 deletions(-) >>>>> >>>>> Modified: trunk/ompi/mca/rte/orte/rte_orte_module.c >>>>> ============================================================================== >>>>> --- trunk/ompi/mca/rte/orte/rte_orte_module.c Sun Dec 15 11:49:27 >>>>> 2013 (r29916) >>>>> +++ trunk/ompi/mca/rte/orte/rte_orte_module.c 2013-12-15 11:54:01 EST >>>>> (Sun, 15 Dec 2013) (r29917) >>>>> @@ -153,11 +153,6 @@ >>>>> if (OPAL_SUCCESS != (rc = opal_db.fetch((opal_identifier_t*)nm, key, >>>>> data, type))) { >>>>> return rc; >>>>> } >>>>> - /* update the hostname */ >>>>> - proct = ompi_proc_find(nm); >>>>> - if (NULL == proct->proc_hostname) { >>>>> - opal_db.fetch_pointer((opal_identifier_t*)nm, ORTE_DB_HOSTNAME, >>>>> (void**)&proct->proc_hostname, OPAL_STRING); >>>>> - } >>>>> return OMPI_SUCCESS; >>>>> } >>>>> >>>>> @@ -171,11 +166,6 @@ >>>>> if (OPAL_SUCCESS != (rc = opal_db.fetch_pointer((opal_identifier_t*)nm, >>>>> key, data, type))) { >>>>> return rc; >>>>> } >>>>> - /* update the hostname */ >>>>> - proct = ompi_proc_find(nm); >>>>> - if (NULL == proct->proc_hostname) { >>>>> - opal_db.fetch_pointer((opal_identifier_t*)nm, ORTE_DB_HOSTNAME, >>>>> (void**)&proct->proc_hostname, OPAL_STRING); >>>>> - } >>>>> return OMPI_SUCCESS; >>>>> } >>>>> >>>>> @@ -191,11 +181,6 @@ >>>>> OPAL_SCOPE_GLOBAL, key, >>>>> kvs))) { >>>>> return rc; >>>>> } >>>>> - /* update the hostname */ >>>>> - proct = ompi_proc_find(nm); >>>>> - if (NULL == proct->proc_hostname) { >>>>> - opal_db.fetch_pointer((opal_identifier_t*)nm, ORTE_DB_HOSTNAME, >>>>> (void**)&proct->proc_hostname, OPAL_STRING); >>>>> - } >>>>> return OMPI_SUCCESS; >>>>> } >>>>> >>>>> _______________________________________________ >>>>> svn mailing list >>>>> s...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/svn >>>> >>>> _______________________________________________ >>>> devel mailing list >>>> de...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel