Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r16088

Gleb Natapov Tue, 11 Sep 2007 11:05:26 -0400

On Tue, Sep 11, 2007 at 10:54:25AM -0400, George Bosilca wrote:
> We don't want to prevent two thread from entering the code is same time. 
> The algorithm you cited support this case. There is only one moment that is 
Are you sure it support this case? There is a global var mask_in_use
that prevent multiple access.


> critical. The local selection of the next available cid. And this is what 
> we try to protect there. If after the first run, the collective call do not 
> manage to figure out the correct next_cid then we will execute the while 
> loop again. And then this condition make sense, as only the thread running 
> on the smallest communicator cid will continue. This insure that it will 
> pickup the smallest next available cid, and then it's reduce operation will 
> succeed. The other threads will wait until the selection of the next 
> available cid is unlocked.
>
> Without the code you removed we face a deadlock situation. Multiple threads 
> will pick different next_cid on each process and thy will never succeed 
> with the reduce operation. And this is what we're trying to avoid with the 
> test.
OK. I think now I get the idea behind this test. I'll restore it and
leave ompi_comm_unregister_cid() fix in place. Is this OK?

>
>   george.
>
> On Sep 11, 2007, at 10:34 AM, Gleb Natapov wrote:
>
>> On Tue, Sep 11, 2007 at 10:14:30AM -0400, George Bosilca wrote:
>>> Gleb,
>>>
>>> This patch is not correct. The code preventing the registration of the 
>>> same
>>> communicator twice is later in the code (same file in the function
>>> ompi_comm_register_cid line 326). Once the function 
>>> ompi_comm_register_cid
>> I saw this code and the comment. The problem is not with the same
>> communicator but with different communicators.
>>
>>> is called, we know that each communicator only handle one "communicator
>>> creation" function at the same time. Therefore, we want to give priority 
>>> to
>>> the smallest com_id, which is what happens in the code you removed.
>> The code I removed was doing it wrongly. I.e the algorithm sometimes is 
>> executed
>> for different communicators simultaneously by different threads. Think
>> about the case where the function is running for cid 1 and then another
>> thread runs it for cid 0. cid 0 will proceed although the function is
>> executed on another CPU. And this is not something theoretical, that
>> is happening with sun's thread test suit mpi_coll test case.
>>
>>>
>>> Without the condition in the ompi_comm_register_cid (each communicator 
>>> only
>>> get registered once) your comment make sense. However, with the condition
>>> your patch allow a dead end situation, while 2 processes try to create
>>> communicators in multiple threads, and they will never succeed, simply
>>> because they will not order the creation based on the com_id.
>> If the algorithm is really prone to deadlock in case it is concurrently
>> executed for several different communicators (I haven't check this),
>> then we may want to fix original code to really prevent two threads to
>> enter the function, but then I don't see the reason for all those
>> complications with ompi_comm_register_cid()/ompi_comm_unregister_cid()
>> The algorithm described here:
>> http://209.85.129.104/search?q=cache:5PV5MMRkBWkJ:ftp://info.mcs.anl.gov/pub/tech_reports/reports/P1382.pdf+MPI+communicator+dup+algorithm&hl=en&ct=clnk&cd=2
>> in section 5.3 works without it and we can do something similar.
>>
>>>
>>>   george.
>>>
>>>
>>>
>>> On Sep 11, 2007, at 9:23 AM, g...@osl.iu.edu wrote:
>>>
>>>> Author: gleb
>>>> Date: 2007-09-11 09:23:46 EDT (Tue, 11 Sep 2007)
>>>> New Revision: 16088
>>>> URL: https://svn.open-mpi.org/trac/ompi/changeset/16088
>>>>
>>>> Log:
>>>> The code tries to prevent itself from running for more then one
>>>> communicator
>>>> simultaneously, but is doing it incorrectly. If the function is running
>>>> already
>>>> for one communicator and it is called from another thread for other
>>>> communicator
>>>> with lower cid the check comm->c_contextid != ompi_comm_lowest_cid()
>>>> will fail and the function will be executed for two different
>>>> communicators by
>>>> two threads simultaneously. There is nothing in the algorithm that 
>>>> prevent
>>>> it
>>>> from been running simultaneously for different communicators as far as I
>>>> can see,
>>>> but ompi_comm_unregister_cid() assumes that it is always called for a
>>>> communicator
>>>> with the lowest cid and this is not always the case. This patch removes
>>>> bogus
>>>> lowest cid check and fix ompi_comm_register_cid() to properly remove cid
>>>> from
>>>> the list.
>>>>
>>>> Text files modified:
>>>>    trunk/ompi/communicator/comm_cid.c |    24 ++++++++++++------------
>>>>    1 files changed, 12 insertions(+), 12 deletions(-)
>>>>
>>>> Modified: trunk/ompi/communicator/comm_cid.c
>>>> ==============================================================================
>>>> --- trunk/ompi/communicator/comm_cid.c     (original)
>>>> +++ trunk/ompi/communicator/comm_cid.c     2007-09-11 09:23:46 EDT (Tue, 11
>>>> Sep 2007)
>>>> @@ -11,6 +11,7 @@
>>>>   *                         All rights reserved.
>>>>   * Copyright (c) 2006-2007 University of Houston. All rights reserved.
>>>>   * Copyright (c) 2007      Cisco, Inc.  All rights reserved.
>>>> + * Copyright (c) 2007      Voltaire All rights reserved.
>>>>   * $COPYRIGHT$
>>>>   *
>>>>   * Additional copyrights may follow
>>>> @@ -170,15 +171,6 @@
>>>>               * This is the real algorithm described in the doc
>>>>               */
>>>>
>>>> -            OPAL_THREAD_LOCK(&ompi_cid_lock);
>>>> -            if (comm->c_contextid != ompi_comm_lowest_cid() ) {
>>>> -                /* if not lowest cid, we do not continue, but sleep and
>>>> try again */
>>>> -                OPAL_THREAD_UNLOCK(&ompi_cid_lock);
>>>> -                continue;
>>>> -            }
>>>> -            OPAL_THREAD_UNLOCK(&ompi_cid_lock);
>>>> -
>>>> -
>>>>              for (i=start; i < mca_pml.pml_max_contextid ; i++) {
>>>>
>>>> flag=ompi_pointer_array_test_and_set_item(&ompi_mpi_communicators,
>>>>                                                            i, comm);
>>>> @@ -365,10 +357,18 @@
>>>>
>>>>  static int ompi_comm_unregister_cid (uint32_t cid)
>>>>  {
>>>> -    ompi_comm_reg_t *regcom=NULL;
>>>> -    opal_list_item_t
>>>> *item=opal_list_remove_first(&ompi_registered_comms);
>>>> +    ompi_comm_reg_t *regcom;
>>>> +    opal_list_item_t *item;
>>>>
>>>> -    regcom = (ompi_comm_reg_t *) item;
>>>> +    for (item = opal_list_get_first(&ompi_registered_comms);
>>>> +         item != opal_list_get_end(&ompi_registered_comms);
>>>> +         item = opal_list_get_next(item)) {
>>>> +        regcom = (ompi_comm_reg_t *)item;
>>>> +        if(regcom->cid == cid) {
>>>> +            opal_list_remove_item(&ompi_registered_comms, item);
>>>> +            break;
>>>> +        }
>>>> +    }
>>>>      OBJ_RELEASE(regcom);
>>>>      return OMPI_SUCCESS;
>>>>  }
>>>> _______________________________________________
>>>> svn-full mailing list
>>>> svn-f...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/svn-full
>>>
>>
>>
>>
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>> --
>>                      Gleb.
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>



> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

--
                        Gleb.

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r16088

Reply via email to