On Tue, Sep 11, 2007 at 11:30:53AM -0400, George Bosilca wrote:
>
> On Sep 11, 2007, at 11:05 AM, Gleb Natapov wrote:
>
>> On Tue, Sep 11, 2007 at 10:54:25AM -0400, George Bosilca wrote:
>>> We don't want to prevent two thread from entering the code is same time.
>>> The algorithm you cited support this case. There is only one moment that 
>>> is
>> Are you sure it support this case? There is a global var mask_in_use
>> that prevent multiple access.
>
> I'm unable to find the mask_in_use global variable. Where it is ?
I thought by "algorithm you cited" you meant the algorithm described in
the link I provided. There is mask_in_use global var there that IMO
ensure that algorithm is executed for only one communicator
simultaneously.

>
>   george.
>
>>
>>> critical. The local selection of the next available cid. And this is what
>>> we try to protect there. If after the first run, the collective call do 
>>> not
>>> manage to figure out the correct next_cid then we will execute the while
>>> loop again. And then this condition make sense, as only the thread 
>>> running
>>> on the smallest communicator cid will continue. This insure that it will
>>> pickup the smallest next available cid, and then it's reduce operation 
>>> will
>>> succeed. The other threads will wait until the selection of the next
>>> available cid is unlocked.
>>>
>>> Without the code you removed we face a deadlock situation. Multiple 
>>> threads
>>> will pick different next_cid on each process and thy will never succeed
>>> with the reduce operation. And this is what we're trying to avoid with 
>>> the
>>> test.
>> OK. I think now I get the idea behind this test. I'll restore it and
>> leave ompi_comm_unregister_cid() fix in place. Is this OK?
>>
>>>
>>>   george.
>>>
>>> On Sep 11, 2007, at 10:34 AM, Gleb Natapov wrote:
>>>
>>>> On Tue, Sep 11, 2007 at 10:14:30AM -0400, George Bosilca wrote:
>>>>> Gleb,
>>>>>
>>>>> This patch is not correct. The code preventing the registration of the
>>>>> same
>>>>> communicator twice is later in the code (same file in the function
>>>>> ompi_comm_register_cid line 326). Once the function
>>>>> ompi_comm_register_cid
>>>> I saw this code and the comment. The problem is not with the same
>>>> communicator but with different communicators.
>>>>
>>>>> is called, we know that each communicator only handle one "communicator
>>>>> creation" function at the same time. Therefore, we want to give 
>>>>> priority
>>>>> to
>>>>> the smallest com_id, which is what happens in the code you removed.
>>>> The code I removed was doing it wrongly. I.e the algorithm sometimes is
>>>> executed
>>>> for different communicators simultaneously by different threads. Think
>>>> about the case where the function is running for cid 1 and then another
>>>> thread runs it for cid 0. cid 0 will proceed although the function is
>>>> executed on another CPU. And this is not something theoretical, that
>>>> is happening with sun's thread test suit mpi_coll test case.
>>>>
>>>>>
>>>>> Without the condition in the ompi_comm_register_cid (each communicator
>>>>> only
>>>>> get registered once) your comment make sense. However, with the 
>>>>> condition
>>>>> your patch allow a dead end situation, while 2 processes try to create
>>>>> communicators in multiple threads, and they will never succeed, simply
>>>>> because they will not order the creation based on the com_id.
>>>> If the algorithm is really prone to deadlock in case it is concurrently
>>>> executed for several different communicators (I haven't check this),
>>>> then we may want to fix original code to really prevent two threads to
>>>> enter the function, but then I don't see the reason for all those
>>>> complications with ompi_comm_register_cid()/ompi_comm_unregister_cid()
>>>> The algorithm described here:
>>>> http://209.85.129.104/search?q=cache:5PV5MMRkBWkJ:ftp://info.mcs.anl.gov/pub/tech_reports/reports/P1382.pdf+MPI+communicator+dup+algorithm&hl=en&ct=clnk&cd=2
>>>> in section 5.3 works without it and we can do something similar.
>>>>
>>>>>
>>>>>   george.
>>>>>
>>>>>
>>>>>
>>>>> On Sep 11, 2007, at 9:23 AM, g...@osl.iu.edu wrote:
>>>>>
>>>>>> Author: gleb
>>>>>> Date: 2007-09-11 09:23:46 EDT (Tue, 11 Sep 2007)
>>>>>> New Revision: 16088
>>>>>> URL: https://svn.open-mpi.org/trac/ompi/changeset/16088
>>>>>>
>>>>>> Log:
>>>>>> The code tries to prevent itself from running for more then one
>>>>>> communicator
>>>>>> simultaneously, but is doing it incorrectly. If the function is 
>>>>>> running
>>>>>> already
>>>>>> for one communicator and it is called from another thread for other
>>>>>> communicator
>>>>>> with lower cid the check comm->c_contextid != ompi_comm_lowest_cid()
>>>>>> will fail and the function will be executed for two different
>>>>>> communicators by
>>>>>> two threads simultaneously. There is nothing in the algorithm that
>>>>>> prevent
>>>>>> it
>>>>>> from been running simultaneously for different communicators as far as 
>>>>>> I
>>>>>> can see,
>>>>>> but ompi_comm_unregister_cid() assumes that it is always called for a
>>>>>> communicator
>>>>>> with the lowest cid and this is not always the case. This patch 
>>>>>> removes
>>>>>> bogus
>>>>>> lowest cid check and fix ompi_comm_register_cid() to properly remove 
>>>>>> cid
>>>>>> from
>>>>>> the list.
>>>>>>
>>>>>> Text files modified:
>>>>>>    trunk/ompi/communicator/comm_cid.c |    24 ++++++++++++------------
>>>>>>    1 files changed, 12 insertions(+), 12 deletions(-)
>>>>>>
>>>>>> Modified: trunk/ompi/communicator/comm_cid.c
>>>>>> ==============================================================================
>>>>>> --- trunk/ompi/communicator/comm_cid.c   (original)
>>>>>> +++ trunk/ompi/communicator/comm_cid.c   2007-09-11 09:23:46 EDT (Tue, 
>>>>>> 11
>>>>>> Sep 2007)
>>>>>> @@ -11,6 +11,7 @@
>>>>>>   *                         All rights reserved.
>>>>>>   * Copyright (c) 2006-2007 University of Houston. All rights 
>>>>>> reserved.
>>>>>>   * Copyright (c) 2007      Cisco, Inc.  All rights reserved.
>>>>>> + * Copyright (c) 2007      Voltaire All rights reserved.
>>>>>>   * $COPYRIGHT$
>>>>>>   *
>>>>>>   * Additional copyrights may follow
>>>>>> @@ -170,15 +171,6 @@
>>>>>>               * This is the real algorithm described in the doc
>>>>>>               */
>>>>>>
>>>>>> -            OPAL_THREAD_LOCK(&ompi_cid_lock);
>>>>>> -            if (comm->c_contextid != ompi_comm_lowest_cid() ) {
>>>>>> -                /* if not lowest cid, we do not continue, but sleep 
>>>>>> and
>>>>>> try again */
>>>>>> -                OPAL_THREAD_UNLOCK(&ompi_cid_lock);
>>>>>> -                continue;
>>>>>> -            }
>>>>>> -            OPAL_THREAD_UNLOCK(&ompi_cid_lock);
>>>>>> -
>>>>>> -
>>>>>>              for (i=start; i < mca_pml.pml_max_contextid ; i++) {
>>>>>>
>>>>>> flag=ompi_pointer_array_test_and_set_item(&ompi_mpi_communicators,
>>>>>>                                                            i, comm);
>>>>>> @@ -365,10 +357,18 @@
>>>>>>
>>>>>>  static int ompi_comm_unregister_cid (uint32_t cid)
>>>>>>  {
>>>>>> -    ompi_comm_reg_t *regcom=NULL;
>>>>>> -    opal_list_item_t
>>>>>> *item=opal_list_remove_first(&ompi_registered_comms);
>>>>>> +    ompi_comm_reg_t *regcom;
>>>>>> +    opal_list_item_t *item;
>>>>>>
>>>>>> -    regcom = (ompi_comm_reg_t *) item;
>>>>>> +    for (item = opal_list_get_first(&ompi_registered_comms);
>>>>>> +         item != opal_list_get_end(&ompi_registered_comms);
>>>>>> +         item = opal_list_get_next(item)) {
>>>>>> +        regcom = (ompi_comm_reg_t *)item;
>>>>>> +        if(regcom->cid == cid) {
>>>>>> +            opal_list_remove_item(&ompi_registered_comms, item);
>>>>>> +            break;
>>>>>> +        }
>>>>>> +    }
>>>>>>      OBJ_RELEASE(regcom);
>>>>>>      return OMPI_SUCCESS;
>>>>>>  }
>>>>>> _______________________________________________
>>>>>> svn-full mailing list
>>>>>> svn-f...@open-mpi.org
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/svn-full
>>>>>
>>>>
>>>>
>>>>
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> de...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>> --
>>>>                    Gleb.
>>>> _______________________________________________
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>
>>
>>
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>> --
>>                      Gleb.
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>



> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

--
                        Gleb.

Reply via email to