Does this need to go to v1.6?

On Jun 21, 2013, at 11:59 AM, Nathan Hjelm <hje...@lanl.gov> wrote:

> Found my original fix (still don't know why I never pushed it) and I think 
> George is correct. This should in both the single and multiple get cases.
> 
> -Nathan
> 
> On Fri, Jun 21, 2013 at 05:52:28PM +0200, George Bosilca wrote:
>> The amount of bytes received is atomically updated on the completion 
>> callback, and the completion test is clearly spelled-out int the 
>> recv_request_pml_complete_check function (of course minus the lock part). 
>> Rolf I think your patch is correct.
>> 
>> That being said req_bytes_expected is a special value, one that should only 
>> be used to check from truncation. Otherwise, req_bytes_packed is the value 
>> we should compare against.
>> 
>>  George.
>> 
>> On Jun 21, 2013, at 17:40 , Nathan Hjelm <hje...@lanl.gov> wrote:
>> 
>>> I thought I fixed this problem awhile back (though looking at the code its 
>>> possible I never committed the fix). I will have to look through my local 
>>> repository and see what happened to that fix. Your fix might not work 
>>> correctly since a RGET can be broken up into multiple get operations. It 
>>> may work, I would just need to test it to make sure.
>>> 
>>> -Nathan
>>> 
>>> On Fri, Jun 21, 2013 at 08:25:29AM -0700, Rolf vandeVaart wrote:
>>>> I ran into a hang in a test in which the sender sends less data than the 
>>>> receiver is expecting.  For example, the following shows the receiver 
>>>> expecting twice what the sender is sending.
>>>> 
>>>> Rank 0:  MPI_Send(buf, BUFSIZE, MPI_INT, 1, 99, MPI_COMM_WORLD)
>>>> Rank 1: MPI_Recv(buf, BUFSIZE*2,  MPI_INT, 0, 99, MPI_COMM_WORLD)
>>>> 
>>>> This is also reproducible using one of the intel tests and adjusting the 
>>>> eager value for the openib BTL.
>>>> 
>>>> ?  mpirun -np 2 -host frick,frack -mca btl_openib_eager_limit 56 
>>>> MPI_Send_overtake_c
>>>> 
>>>> In most cases, this works just fine.  However, when the PML protocol used 
>>>> is the RGET protocol, the test hangs.   Below is a proposed fix for this 
>>>> issue.
>>>> I believe we want to be checking against req_bytes_packed rather than 
>>>> req_bytes_expected as req_bytes_expected is what the user originally told 
>>>> us.
>>>> Otherwise, with the current code, we never send a FIN message back to the 
>>>> sender.
>>>> 
>>>> Any thoughts?
>>>> 
>>>> [rvandevaart@sm065 ompi-trunk]$ svn diff ompi/mca/pml/ob1/pml_ob1_recvreq.c
>>>> Index: ompi/mca/pml/ob1/pml_ob1_recvreq.c
>>>> ===================================================================
>>>> --- ompi/mca/pml/ob1/pml_ob1_recvreq.c        (revision 28633)
>>>> +++ ompi/mca/pml/ob1/pml_ob1_recvreq.c     (working copy)
>>>> @@ -335,7 +335,7 @@
>>>>    /* is receive request complete */
>>>>    OPAL_THREAD_ADD_SIZE_T(&recvreq->req_bytes_received, frag->rdma_length);
>>>> -    if (recvreq->req_bytes_expected <= recvreq->req_bytes_received) {
>>>> +    if (recvreq->req_recv.req_bytes_packed <= 
>>>> recvreq->req_bytes_received) {
>>>>        mca_pml_ob1_send_fin(recvreq->req_recv.req_base.req_proc,
>>>>                              bml_btl,
>>>>                                     frag->rdma_hdr.hdr_rget.hdr_des,
>>>> 
>>>> 
>>>> 
>>>> -----------------------------------------------------------------------------------
>>>> This email message is for the sole use of the intended recipient(s) and 
>>>> may contain
>>>> confidential information.  Any unauthorized review, use, disclosure or 
>>>> distribution
>>>> is prohibited.  If you are not the intended recipient, please contact the 
>>>> sender by
>>>> reply email and destroy all copies of the original message.
>>>> -----------------------------------------------------------------------------------
>>> 
>>>> _______________________________________________
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> 
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to