Found my original fix (still don't know why I never pushed it) and I think 
George is correct. This should in both the single and multiple get cases.

-Nathan

On Fri, Jun 21, 2013 at 05:52:28PM +0200, George Bosilca wrote:
> The amount of bytes received is atomically updated on the completion 
> callback, and the completion test is clearly spelled-out int the 
> recv_request_pml_complete_check function (of course minus the lock part). 
> Rolf I think your patch is correct.
> 
> That being said req_bytes_expected is a special value, one that should only 
> be used to check from truncation. Otherwise, req_bytes_packed is the value we 
> should compare against.
> 
>   George.
> 
> On Jun 21, 2013, at 17:40 , Nathan Hjelm <hje...@lanl.gov> wrote:
> 
> > I thought I fixed this problem awhile back (though looking at the code its 
> > possible I never committed the fix). I will have to look through my local 
> > repository and see what happened to that fix. Your fix might not work 
> > correctly since a RGET can be broken up into multiple get operations. It 
> > may work, I would just need to test it to make sure.
> > 
> > -Nathan
> > 
> > On Fri, Jun 21, 2013 at 08:25:29AM -0700, Rolf vandeVaart wrote:
> >> I ran into a hang in a test in which the sender sends less data than the 
> >> receiver is expecting.  For example, the following shows the receiver 
> >> expecting twice what the sender is sending.
> >> 
> >> Rank 0:  MPI_Send(buf, BUFSIZE, MPI_INT, 1, 99, MPI_COMM_WORLD)
> >> Rank 1: MPI_Recv(buf, BUFSIZE*2,  MPI_INT, 0, 99, MPI_COMM_WORLD)
> >> 
> >> This is also reproducible using one of the intel tests and adjusting the 
> >> eager value for the openib BTL.
> >> 
> >> ?  mpirun -np 2 -host frick,frack -mca btl_openib_eager_limit 56 
> >> MPI_Send_overtake_c
> >> 
> >> In most cases, this works just fine.  However, when the PML protocol used 
> >> is the RGET protocol, the test hangs.   Below is a proposed fix for this 
> >> issue.
> >> I believe we want to be checking against req_bytes_packed rather than 
> >> req_bytes_expected as req_bytes_expected is what the user originally told 
> >> us.
> >> Otherwise, with the current code, we never send a FIN message back to the 
> >> sender.
> >> 
> >> Any thoughts?
> >> 
> >> [rvandevaart@sm065 ompi-trunk]$ svn diff ompi/mca/pml/ob1/pml_ob1_recvreq.c
> >> Index: ompi/mca/pml/ob1/pml_ob1_recvreq.c
> >> ===================================================================
> >> --- ompi/mca/pml/ob1/pml_ob1_recvreq.c        (revision 28633)
> >> +++ ompi/mca/pml/ob1/pml_ob1_recvreq.c     (working copy)
> >> @@ -335,7 +335,7 @@
> >>     /* is receive request complete */
> >>     OPAL_THREAD_ADD_SIZE_T(&recvreq->req_bytes_received, 
> >> frag->rdma_length);
> >> -    if (recvreq->req_bytes_expected <= recvreq->req_bytes_received) {
> >> +    if (recvreq->req_recv.req_bytes_packed <= 
> >> recvreq->req_bytes_received) {
> >>         mca_pml_ob1_send_fin(recvreq->req_recv.req_base.req_proc,
> >>                               bml_btl,
> >>                                      frag->rdma_hdr.hdr_rget.hdr_des,
> >> 
> >> 
> >> 
> >> -----------------------------------------------------------------------------------
> >> This email message is for the sole use of the intended recipient(s) and 
> >> may contain
> >> confidential information.  Any unauthorized review, use, disclosure or 
> >> distribution
> >> is prohibited.  If you are not the intended recipient, please contact the 
> >> sender by
> >> reply email and destroy all copies of the original message.
> >> -----------------------------------------------------------------------------------
> > 
> >> _______________________________________________
> >> devel mailing list
> >> de...@open-mpi.org
> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > 
> > _______________________________________________
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Reply via email to