Thanks for the suggestion. Unfortunately, things are never as simple as
they look. The original code was wrong for weeks/months. This unlock is on
an execution path that would require an MPI_Cancel happening in exactly the
same time as the PML completes the same receive request, or in other words
an execution path that is not likely to happen, and certainly not covered
by any of our tests.

 George.



On Thu, Apr 7, 2016 at 2:53 PM, Paul Hargrove <phhargr...@lbl.gov> wrote:

> In the same spirit as Ralph intended, I want to suggest that code changes
> be *run* before pushing to master.
> If necessary, add temporary debugging output to help determine that your
> changes are reached in testing.
> This can save one the public embarrassment of having a thread like one
> this dedicated to your typos! ;-)
>
> In this particular instance I suspect that any MPI test program that could
> reach the changed code would have deadlocked.
>
> -Paul
>
> On Thu, Apr 7, 2016 at 11:42 AM, Ralph Castain <r...@open-mpi.org> wrote:
>
>> Apologies - didn’t mean to make it sound that way. Just wanted to inform
>> a new person on the recommended procedures in case they were unaware of
>> them.
>>
>>
>> On Apr 7, 2016, at 11:07 AM, George Bosilca <bosi...@icl.utk.edu> wrote:
>>
>> Let's not blown this out of proportion, it was nothing more than a typo
>> pinpointed and fixed in a matter of seconds.
>>
>>   George.
>>
>>
>> On Thu, Apr 7, 2016 at 1:58 PM, Ralph Castain <r...@open-mpi.org> wrote:
>>
>>> Just as a suggestion: please express such changes in the form of a Pull
>>> Request instead of a direct commit to avoid getting such mistakes into the
>>> code base.
>>>
>>> I’m not advocating it for truly trivial stuff - but changing the
>>> thread_unlock to an OB1 call probably should be given a chance for comment
>>>
>>> > On Apr 7, 2016, at 10:45 AM, Nathan Hjelm <hje...@lanl.gov> wrote:
>>> >
>>> >
>>> > Hah, just caught that as well. Commented on the commit on
>>> > github. Definitely looks wrong.
>>> >
>>> > -Nathan
>>> >
>>> > On Thu, Apr 07, 2016 at 05:43:17PM +0000, Dave Goodell (dgoodell)
>>> wrote:
>>> >> [inline]
>>> >>
>>> >> On Apr 7, 2016, at 12:53 PM, git...@crest.iu.edu wrote:
>>> >>>
>>> >>> This is an automated email from the git hooks/post-receive script.
>>> It was
>>> >>> generated because a ref change was pushed to the repository
>>> containing
>>> >>> the project "open-mpi/ompi".
>>> >>>
>>> >>> The branch, master has been updated
>>> >>>      via  92290b94e0584271d6459a6ab5923a04125e23be (commit)
>>> >>>     from  7cdf50533cf940258072f70231a4a456fa73d2f8 (commit)
>>> >>>
>>> >>> Those revisions listed above that are new to this repository have
>>> >>> not appeared on any other notification email; so we list those
>>> >>> revisions in full, below.
>>> >>>
>>> >>> - Log
>>> -----------------------------------------------------------------
>>> >>>
>>> https://github.com/open-mpi/ompi/commit/92290b94e0584271d6459a6ab5923a04125e23be
>>> >>>
>>> >>> commit 92290b94e0584271d6459a6ab5923a04125e23be
>>> >>> Author: Thananon Patinyasakdikul <tpati...@utk.edu>
>>> >>> Date:   Wed Apr 6 14:26:04 2016 -0400
>>> >>>
>>> >>>   Fixed Coverity reports 1358014-1358018 (DEADCODE and CHECK_RETURN)
>>> >>>
>>> >>> diff --git a/ompi/mca/pml/ob1/pml_ob1_recvreq.c
>>> b/ompi/mca/pml/ob1/pml_ob1_recvreq.c
>>> >>> index 9d1d402..a912bc3 100644
>>> >>> --- a/ompi/mca/pml/ob1/pml_ob1_recvreq.c
>>> >>> +++ b/ompi/mca/pml/ob1/pml_ob1_recvreq.c
>>> >>> @@ -106,7 +106,7 @@ static int
>>> mca_pml_ob1_recv_request_cancel(struct ompi_request_t* ompi_request,
>>> >>>    /* The rest should be protected behind the match logic lock */
>>> >>>    OB1_MATCHING_LOCK(&ob1_comm->matching_lock);
>>> >>>    if( true == request->req_match_received ) { /* way to late to
>>> cancel this one */
>>> >>> -        OPAL_THREAD_UNLOCK(&ob1_comm->matching_lock);
>>> >>> +        OB1_MATCHING_LOCK(&ob1_comm->matching_lock);
>>> >>
>>> >> I've only taken a cursory look, but this looks very wrong to me.
>>> Shouldn't you be using the "OB1_MATCHING_UNLOCK" macro instead?  Doubly
>>> locking the lock will almost certainly lead to sadness.
>>> >>
>>> >>>        assert( OMPI_ANY_TAG != ompi_request->req_status.MPI_TAG );
>>> /* not matched isn't it */
>>> >>>        return OMPI_SUCCESS;
>>> >>>    }
>>> >>> diff --git a/opal/mca/btl/tcp/btl_tcp.h b/opal/mca/btl/tcp/btl_tcp.h
>>> >>> index f2c8917..7e9d726 100644
>>> >>> --- a/opal/mca/btl/tcp/btl_tcp.h
>>> >>> +++ b/opal/mca/btl/tcp/btl_tcp.h
>>> >>> @@ -96,7 +96,7 @@ extern int mca_btl_tcp_progress_thread_trigger;
>>> >>>    do {
>>>   \
>>> >>>        if(0 < mca_btl_tcp_progress_thread_trigger) {
>>>    \
>>> >>>            opal_event_t* _event = (opal_event_t*)(event);
>>>       \
>>> >>> -            opal_fd_write( mca_btl_tcp_pipe_to_progress[1],
>>> sizeof(opal_event_t*), \
>>> >>> +            (void) opal_fd_write( mca_btl_tcp_pipe_to_progress[1],
>>> sizeof(opal_event_t*), \
>>> >>
>>> >> Seems better to capture the return value and at least put an assert()
>>> on it if it fails, though admittedly things are very screwed up if you
>>> overrun the pipe.
>>> >>
>>> >> -Dave
>>> >>
>>> >> _______________________________________________
>>> >> devel mailing list
>>> >> de...@open-mpi.org
>>> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> >> Link to this post:
>>> http://www.open-mpi.org/community/lists/devel/2016/04/18739.php
>>> > _______________________________________________
>>> > devel mailing list
>>> > de...@open-mpi.org
>>> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> > Link to this post:
>>> http://www.open-mpi.org/community/lists/devel/2016/04/18740.php
>>>
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/devel/2016/04/18741.php
>>
>>
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2016/04/18742.php
>>
>>
>>
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2016/04/18743.php
>>
>
>
>
> --
> Paul H. Hargrove                          phhargr...@lbl.gov
> Computer Languages & Systems Software (CLaSS) Group
> Computer Science Department               Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2016/04/18744.php
>

Reply via email to