On Tue, Sep 11, 2007 at 10:00:07AM -0500, Edgar Gabriel wrote: > Gleb, > > in the scenario which you describe in the comment to the patch, what > should happen is, that the communicator with the cid which started > already the allreduce will basically 'hang' until the other processes > 'allow' the lower cids to continue. It should basically be blocked in > the allreduce. Why? Two threads are allowed to run allreduce simultaneously for different communicators. Are they?
> > However, here is something, where we might have problems with the sun > thread tests (and we discussed that with Terry already): the cid > allocation algorithm as implemented in Open MPI assumes ( -- this was/is > my/our understanding of the standard --) that a communicator creation is > a collective operation. This means, you can not have a comm_create and > another allreduce of the same communicator running in different threads, > because these allreduces will mix up and produce non-sense results. We > fixed the case, if all collective operations are comm_creates, but if > some of the threads are in a comm_create and some are in allreduce on > the same communicator, it won't work. Correct, but this is not what happens with mt_coll test. mt_coll calls commdup on the same communicator in different threads concurrently, but we handle this case inside ompi_comm_nextcid(). > > > Gleb Natapov wrote: > > On Tue, Sep 11, 2007 at 10:14:30AM -0400, George Bosilca wrote: > >> Gleb, > >> > >> This patch is not correct. The code preventing the registration of the > >> same > >> communicator twice is later in the code (same file in the function > >> ompi_comm_register_cid line 326). Once the function ompi_comm_register_cid > > I saw this code and the comment. The problem is not with the same > > communicator but with different communicators. > > > >> is called, we know that each communicator only handle one "communicator > >> creation" function at the same time. Therefore, we want to give priority > >> to > >> the smallest com_id, which is what happens in the code you removed. > > The code I removed was doing it wrongly. I.e the algorithm sometimes is > > executed > > for different communicators simultaneously by different threads. Think > > about the case where the function is running for cid 1 and then another > > thread runs it for cid 0. cid 0 will proceed although the function is > > executed on another CPU. And this is not something theoretical, that > > is happening with sun's thread test suit mpi_coll test case. > > > >> Without the condition in the ompi_comm_register_cid (each communicator > >> only > >> get registered once) your comment make sense. However, with the condition > >> your patch allow a dead end situation, while 2 processes try to create > >> communicators in multiple threads, and they will never succeed, simply > >> because they will not order the creation based on the com_id. > > If the algorithm is really prone to deadlock in case it is concurrently > > executed for several different communicators (I haven't check this), > > then we may want to fix original code to really prevent two threads to > > enter the function, but then I don't see the reason for all those > > complications with ompi_comm_register_cid()/ompi_comm_unregister_cid() > > The algorithm described here: > > http://209.85.129.104/search?q=cache:5PV5MMRkBWkJ:ftp://info.mcs.anl.gov/pub/tech_reports/reports/P1382.pdf+MPI+communicator+dup+algorithm&hl=en&ct=clnk&cd=2 > > in section 5.3 works without it and we can do something similar. > > > >> george. > >> > >> > >> > >> On Sep 11, 2007, at 9:23 AM, g...@osl.iu.edu wrote: > >> > >>> Author: gleb > >>> Date: 2007-09-11 09:23:46 EDT (Tue, 11 Sep 2007) > >>> New Revision: 16088 > >>> URL: https://svn.open-mpi.org/trac/ompi/changeset/16088 > >>> > >>> Log: > >>> The code tries to prevent itself from running for more then one > >>> communicator > >>> simultaneously, but is doing it incorrectly. If the function is running > >>> already > >>> for one communicator and it is called from another thread for other > >>> communicator > >>> with lower cid the check comm->c_contextid != ompi_comm_lowest_cid() > >>> will fail and the function will be executed for two different > >>> communicators by > >>> two threads simultaneously. There is nothing in the algorithm that > >>> prevent > >>> it > >>> from been running simultaneously for different communicators as far as I > >>> can see, > >>> but ompi_comm_unregister_cid() assumes that it is always called for a > >>> communicator > >>> with the lowest cid and this is not always the case. This patch removes > >>> bogus > >>> lowest cid check and fix ompi_comm_register_cid() to properly remove cid > >>> from > >>> the list. > >>> > >>> Text files modified: > >>> trunk/ompi/communicator/comm_cid.c | 24 ++++++++++++------------ > >>> 1 files changed, 12 insertions(+), 12 deletions(-) > >>> > >>> Modified: trunk/ompi/communicator/comm_cid.c > >>> ============================================================================== > >>> --- trunk/ompi/communicator/comm_cid.c (original) > >>> +++ trunk/ompi/communicator/comm_cid.c 2007-09-11 09:23:46 EDT (Tue, > >>> 11 > >>> Sep 2007) > >>> @@ -11,6 +11,7 @@ > >>> * All rights reserved. > >>> * Copyright (c) 2006-2007 University of Houston. All rights reserved. > >>> * Copyright (c) 2007 Cisco, Inc. All rights reserved. > >>> + * Copyright (c) 2007 Voltaire All rights reserved. > >>> * $COPYRIGHT$ > >>> * > >>> * Additional copyrights may follow > >>> @@ -170,15 +171,6 @@ > >>> * This is the real algorithm described in the doc > >>> */ > >>> > >>> - OPAL_THREAD_LOCK(&ompi_cid_lock); > >>> - if (comm->c_contextid != ompi_comm_lowest_cid() ) { > >>> - /* if not lowest cid, we do not continue, but sleep and > >>> try again */ > >>> - OPAL_THREAD_UNLOCK(&ompi_cid_lock); > >>> - continue; > >>> - } > >>> - OPAL_THREAD_UNLOCK(&ompi_cid_lock); > >>> - > >>> - > >>> for (i=start; i < mca_pml.pml_max_contextid ; i++) { > >>> > >>> flag=ompi_pointer_array_test_and_set_item(&ompi_mpi_communicators, > >>> i, comm); > >>> @@ -365,10 +357,18 @@ > >>> > >>> static int ompi_comm_unregister_cid (uint32_t cid) > >>> { > >>> - ompi_comm_reg_t *regcom=NULL; > >>> - opal_list_item_t > >>> *item=opal_list_remove_first(&ompi_registered_comms); > >>> + ompi_comm_reg_t *regcom; > >>> + opal_list_item_t *item; > >>> > >>> - regcom = (ompi_comm_reg_t *) item; > >>> + for (item = opal_list_get_first(&ompi_registered_comms); > >>> + item != opal_list_get_end(&ompi_registered_comms); > >>> + item = opal_list_get_next(item)) { > >>> + regcom = (ompi_comm_reg_t *)item; > >>> + if(regcom->cid == cid) { > >>> + opal_list_remove_item(&ompi_registered_comms, item); > >>> + break; > >>> + } > >>> + } > >>> OBJ_RELEASE(regcom); > >>> return OMPI_SUCCESS; > >>> } > >>> _______________________________________________ > >>> svn-full mailing list > >>> svn-f...@open-mpi.org > >>> http://www.open-mpi.org/mailman/listinfo.cgi/svn-full > > > > > > > >> _______________________________________________ > >> devel mailing list > >> de...@open-mpi.org > >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > > -- > > Gleb. > > _______________________________________________ > > devel mailing list > > de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > -- > Edgar Gabriel > Assistant Professor > Parallel Software Technologies Lab http://pstl.cs.uh.edu > Department of Computer Science University of Houston > Philip G. Hoffman Hall, Room 524 Houston, TX-77204, USA > Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335 > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Gleb.