On Tue, Sep 11, 2007 at 11:30:53AM -0400, George Bosilca wrote: > > On Sep 11, 2007, at 11:05 AM, Gleb Natapov wrote: > >> On Tue, Sep 11, 2007 at 10:54:25AM -0400, George Bosilca wrote: >>> We don't want to prevent two thread from entering the code is same time. >>> The algorithm you cited support this case. There is only one moment that >>> is >> Are you sure it support this case? There is a global var mask_in_use >> that prevent multiple access. > > I'm unable to find the mask_in_use global variable. Where it is ? I thought by "algorithm you cited" you meant the algorithm described in the link I provided. There is mask_in_use global var there that IMO ensure that algorithm is executed for only one communicator simultaneously.
> > george. > >> >>> critical. The local selection of the next available cid. And this is what >>> we try to protect there. If after the first run, the collective call do >>> not >>> manage to figure out the correct next_cid then we will execute the while >>> loop again. And then this condition make sense, as only the thread >>> running >>> on the smallest communicator cid will continue. This insure that it will >>> pickup the smallest next available cid, and then it's reduce operation >>> will >>> succeed. The other threads will wait until the selection of the next >>> available cid is unlocked. >>> >>> Without the code you removed we face a deadlock situation. Multiple >>> threads >>> will pick different next_cid on each process and thy will never succeed >>> with the reduce operation. And this is what we're trying to avoid with >>> the >>> test. >> OK. I think now I get the idea behind this test. I'll restore it and >> leave ompi_comm_unregister_cid() fix in place. Is this OK? >> >>> >>> george. >>> >>> On Sep 11, 2007, at 10:34 AM, Gleb Natapov wrote: >>> >>>> On Tue, Sep 11, 2007 at 10:14:30AM -0400, George Bosilca wrote: >>>>> Gleb, >>>>> >>>>> This patch is not correct. The code preventing the registration of the >>>>> same >>>>> communicator twice is later in the code (same file in the function >>>>> ompi_comm_register_cid line 326). Once the function >>>>> ompi_comm_register_cid >>>> I saw this code and the comment. The problem is not with the same >>>> communicator but with different communicators. >>>> >>>>> is called, we know that each communicator only handle one "communicator >>>>> creation" function at the same time. Therefore, we want to give >>>>> priority >>>>> to >>>>> the smallest com_id, which is what happens in the code you removed. >>>> The code I removed was doing it wrongly. I.e the algorithm sometimes is >>>> executed >>>> for different communicators simultaneously by different threads. Think >>>> about the case where the function is running for cid 1 and then another >>>> thread runs it for cid 0. cid 0 will proceed although the function is >>>> executed on another CPU. And this is not something theoretical, that >>>> is happening with sun's thread test suit mpi_coll test case. >>>> >>>>> >>>>> Without the condition in the ompi_comm_register_cid (each communicator >>>>> only >>>>> get registered once) your comment make sense. However, with the >>>>> condition >>>>> your patch allow a dead end situation, while 2 processes try to create >>>>> communicators in multiple threads, and they will never succeed, simply >>>>> because they will not order the creation based on the com_id. >>>> If the algorithm is really prone to deadlock in case it is concurrently >>>> executed for several different communicators (I haven't check this), >>>> then we may want to fix original code to really prevent two threads to >>>> enter the function, but then I don't see the reason for all those >>>> complications with ompi_comm_register_cid()/ompi_comm_unregister_cid() >>>> The algorithm described here: >>>> http://209.85.129.104/search?q=cache:5PV5MMRkBWkJ:ftp://info.mcs.anl.gov/pub/tech_reports/reports/P1382.pdf+MPI+communicator+dup+algorithm&hl=en&ct=clnk&cd=2 >>>> in section 5.3 works without it and we can do something similar. >>>> >>>>> >>>>> george. >>>>> >>>>> >>>>> >>>>> On Sep 11, 2007, at 9:23 AM, g...@osl.iu.edu wrote: >>>>> >>>>>> Author: gleb >>>>>> Date: 2007-09-11 09:23:46 EDT (Tue, 11 Sep 2007) >>>>>> New Revision: 16088 >>>>>> URL: https://svn.open-mpi.org/trac/ompi/changeset/16088 >>>>>> >>>>>> Log: >>>>>> The code tries to prevent itself from running for more then one >>>>>> communicator >>>>>> simultaneously, but is doing it incorrectly. If the function is >>>>>> running >>>>>> already >>>>>> for one communicator and it is called from another thread for other >>>>>> communicator >>>>>> with lower cid the check comm->c_contextid != ompi_comm_lowest_cid() >>>>>> will fail and the function will be executed for two different >>>>>> communicators by >>>>>> two threads simultaneously. There is nothing in the algorithm that >>>>>> prevent >>>>>> it >>>>>> from been running simultaneously for different communicators as far as >>>>>> I >>>>>> can see, >>>>>> but ompi_comm_unregister_cid() assumes that it is always called for a >>>>>> communicator >>>>>> with the lowest cid and this is not always the case. This patch >>>>>> removes >>>>>> bogus >>>>>> lowest cid check and fix ompi_comm_register_cid() to properly remove >>>>>> cid >>>>>> from >>>>>> the list. >>>>>> >>>>>> Text files modified: >>>>>> trunk/ompi/communicator/comm_cid.c | 24 ++++++++++++------------ >>>>>> 1 files changed, 12 insertions(+), 12 deletions(-) >>>>>> >>>>>> Modified: trunk/ompi/communicator/comm_cid.c >>>>>> ============================================================================== >>>>>> --- trunk/ompi/communicator/comm_cid.c (original) >>>>>> +++ trunk/ompi/communicator/comm_cid.c 2007-09-11 09:23:46 EDT (Tue, >>>>>> 11 >>>>>> Sep 2007) >>>>>> @@ -11,6 +11,7 @@ >>>>>> * All rights reserved. >>>>>> * Copyright (c) 2006-2007 University of Houston. All rights >>>>>> reserved. >>>>>> * Copyright (c) 2007 Cisco, Inc. All rights reserved. >>>>>> + * Copyright (c) 2007 Voltaire All rights reserved. >>>>>> * $COPYRIGHT$ >>>>>> * >>>>>> * Additional copyrights may follow >>>>>> @@ -170,15 +171,6 @@ >>>>>> * This is the real algorithm described in the doc >>>>>> */ >>>>>> >>>>>> - OPAL_THREAD_LOCK(&ompi_cid_lock); >>>>>> - if (comm->c_contextid != ompi_comm_lowest_cid() ) { >>>>>> - /* if not lowest cid, we do not continue, but sleep >>>>>> and >>>>>> try again */ >>>>>> - OPAL_THREAD_UNLOCK(&ompi_cid_lock); >>>>>> - continue; >>>>>> - } >>>>>> - OPAL_THREAD_UNLOCK(&ompi_cid_lock); >>>>>> - >>>>>> - >>>>>> for (i=start; i < mca_pml.pml_max_contextid ; i++) { >>>>>> >>>>>> flag=ompi_pointer_array_test_and_set_item(&ompi_mpi_communicators, >>>>>> i, comm); >>>>>> @@ -365,10 +357,18 @@ >>>>>> >>>>>> static int ompi_comm_unregister_cid (uint32_t cid) >>>>>> { >>>>>> - ompi_comm_reg_t *regcom=NULL; >>>>>> - opal_list_item_t >>>>>> *item=opal_list_remove_first(&ompi_registered_comms); >>>>>> + ompi_comm_reg_t *regcom; >>>>>> + opal_list_item_t *item; >>>>>> >>>>>> - regcom = (ompi_comm_reg_t *) item; >>>>>> + for (item = opal_list_get_first(&ompi_registered_comms); >>>>>> + item != opal_list_get_end(&ompi_registered_comms); >>>>>> + item = opal_list_get_next(item)) { >>>>>> + regcom = (ompi_comm_reg_t *)item; >>>>>> + if(regcom->cid == cid) { >>>>>> + opal_list_remove_item(&ompi_registered_comms, item); >>>>>> + break; >>>>>> + } >>>>>> + } >>>>>> OBJ_RELEASE(regcom); >>>>>> return OMPI_SUCCESS; >>>>>> } >>>>>> _______________________________________________ >>>>>> svn-full mailing list >>>>>> svn-f...@open-mpi.org >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/svn-full >>>>> >>>> >>>> >>>> >>>>> _______________________________________________ >>>>> devel mailing list >>>>> de...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>> >>>> -- >>>> Gleb. >>>> _______________________________________________ >>>> devel mailing list >>>> de...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >> >> >> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> -- >> Gleb. >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Gleb.