On Tue, Sep 11, 2007 at 10:54:25AM -0400, George Bosilca wrote: > We don't want to prevent two thread from entering the code is same time. > The algorithm you cited support this case. There is only one moment that is Are you sure it support this case? There is a global var mask_in_use that prevent multiple access.
> critical. The local selection of the next available cid. And this is what > we try to protect there. If after the first run, the collective call do not > manage to figure out the correct next_cid then we will execute the while > loop again. And then this condition make sense, as only the thread running > on the smallest communicator cid will continue. This insure that it will > pickup the smallest next available cid, and then it's reduce operation will > succeed. The other threads will wait until the selection of the next > available cid is unlocked. > > Without the code you removed we face a deadlock situation. Multiple threads > will pick different next_cid on each process and thy will never succeed > with the reduce operation. And this is what we're trying to avoid with the > test. OK. I think now I get the idea behind this test. I'll restore it and leave ompi_comm_unregister_cid() fix in place. Is this OK? > > george. > > On Sep 11, 2007, at 10:34 AM, Gleb Natapov wrote: > >> On Tue, Sep 11, 2007 at 10:14:30AM -0400, George Bosilca wrote: >>> Gleb, >>> >>> This patch is not correct. The code preventing the registration of the >>> same >>> communicator twice is later in the code (same file in the function >>> ompi_comm_register_cid line 326). Once the function >>> ompi_comm_register_cid >> I saw this code and the comment. The problem is not with the same >> communicator but with different communicators. >> >>> is called, we know that each communicator only handle one "communicator >>> creation" function at the same time. Therefore, we want to give priority >>> to >>> the smallest com_id, which is what happens in the code you removed. >> The code I removed was doing it wrongly. I.e the algorithm sometimes is >> executed >> for different communicators simultaneously by different threads. Think >> about the case where the function is running for cid 1 and then another >> thread runs it for cid 0. cid 0 will proceed although the function is >> executed on another CPU. And this is not something theoretical, that >> is happening with sun's thread test suit mpi_coll test case. >> >>> >>> Without the condition in the ompi_comm_register_cid (each communicator >>> only >>> get registered once) your comment make sense. However, with the condition >>> your patch allow a dead end situation, while 2 processes try to create >>> communicators in multiple threads, and they will never succeed, simply >>> because they will not order the creation based on the com_id. >> If the algorithm is really prone to deadlock in case it is concurrently >> executed for several different communicators (I haven't check this), >> then we may want to fix original code to really prevent two threads to >> enter the function, but then I don't see the reason for all those >> complications with ompi_comm_register_cid()/ompi_comm_unregister_cid() >> The algorithm described here: >> http://209.85.129.104/search?q=cache:5PV5MMRkBWkJ:ftp://info.mcs.anl.gov/pub/tech_reports/reports/P1382.pdf+MPI+communicator+dup+algorithm&hl=en&ct=clnk&cd=2 >> in section 5.3 works without it and we can do something similar. >> >>> >>> george. >>> >>> >>> >>> On Sep 11, 2007, at 9:23 AM, g...@osl.iu.edu wrote: >>> >>>> Author: gleb >>>> Date: 2007-09-11 09:23:46 EDT (Tue, 11 Sep 2007) >>>> New Revision: 16088 >>>> URL: https://svn.open-mpi.org/trac/ompi/changeset/16088 >>>> >>>> Log: >>>> The code tries to prevent itself from running for more then one >>>> communicator >>>> simultaneously, but is doing it incorrectly. If the function is running >>>> already >>>> for one communicator and it is called from another thread for other >>>> communicator >>>> with lower cid the check comm->c_contextid != ompi_comm_lowest_cid() >>>> will fail and the function will be executed for two different >>>> communicators by >>>> two threads simultaneously. There is nothing in the algorithm that >>>> prevent >>>> it >>>> from been running simultaneously for different communicators as far as I >>>> can see, >>>> but ompi_comm_unregister_cid() assumes that it is always called for a >>>> communicator >>>> with the lowest cid and this is not always the case. This patch removes >>>> bogus >>>> lowest cid check and fix ompi_comm_register_cid() to properly remove cid >>>> from >>>> the list. >>>> >>>> Text files modified: >>>> trunk/ompi/communicator/comm_cid.c | 24 ++++++++++++------------ >>>> 1 files changed, 12 insertions(+), 12 deletions(-) >>>> >>>> Modified: trunk/ompi/communicator/comm_cid.c >>>> ============================================================================== >>>> --- trunk/ompi/communicator/comm_cid.c (original) >>>> +++ trunk/ompi/communicator/comm_cid.c 2007-09-11 09:23:46 EDT (Tue, 11 >>>> Sep 2007) >>>> @@ -11,6 +11,7 @@ >>>> * All rights reserved. >>>> * Copyright (c) 2006-2007 University of Houston. All rights reserved. >>>> * Copyright (c) 2007 Cisco, Inc. All rights reserved. >>>> + * Copyright (c) 2007 Voltaire All rights reserved. >>>> * $COPYRIGHT$ >>>> * >>>> * Additional copyrights may follow >>>> @@ -170,15 +171,6 @@ >>>> * This is the real algorithm described in the doc >>>> */ >>>> >>>> - OPAL_THREAD_LOCK(&ompi_cid_lock); >>>> - if (comm->c_contextid != ompi_comm_lowest_cid() ) { >>>> - /* if not lowest cid, we do not continue, but sleep and >>>> try again */ >>>> - OPAL_THREAD_UNLOCK(&ompi_cid_lock); >>>> - continue; >>>> - } >>>> - OPAL_THREAD_UNLOCK(&ompi_cid_lock); >>>> - >>>> - >>>> for (i=start; i < mca_pml.pml_max_contextid ; i++) { >>>> >>>> flag=ompi_pointer_array_test_and_set_item(&ompi_mpi_communicators, >>>> i, comm); >>>> @@ -365,10 +357,18 @@ >>>> >>>> static int ompi_comm_unregister_cid (uint32_t cid) >>>> { >>>> - ompi_comm_reg_t *regcom=NULL; >>>> - opal_list_item_t >>>> *item=opal_list_remove_first(&ompi_registered_comms); >>>> + ompi_comm_reg_t *regcom; >>>> + opal_list_item_t *item; >>>> >>>> - regcom = (ompi_comm_reg_t *) item; >>>> + for (item = opal_list_get_first(&ompi_registered_comms); >>>> + item != opal_list_get_end(&ompi_registered_comms); >>>> + item = opal_list_get_next(item)) { >>>> + regcom = (ompi_comm_reg_t *)item; >>>> + if(regcom->cid == cid) { >>>> + opal_list_remove_item(&ompi_registered_comms, item); >>>> + break; >>>> + } >>>> + } >>>> OBJ_RELEASE(regcom); >>>> return OMPI_SUCCESS; >>>> } >>>> _______________________________________________ >>>> svn-full mailing list >>>> svn-f...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/svn-full >>> >> >> >> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> -- >> Gleb. >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Gleb.