I don't think so. The Mellanox change that caused this issue should not be in 
1.6.

-Nathan

On Fri, Jun 21, 2013 at 05:18:16PM +0000, Jeff Squyres (jsquyres) wrote:
> Does this need to go to v1.6?
> 
> On Jun 21, 2013, at 11:59 AM, Nathan Hjelm <hje...@lanl.gov> wrote:
> 
> > Found my original fix (still don't know why I never pushed it) and I think 
> > George is correct. This should in both the single and multiple get cases.
> > 
> > -Nathan
> > 
> > On Fri, Jun 21, 2013 at 05:52:28PM +0200, George Bosilca wrote:
> >> The amount of bytes received is atomically updated on the completion 
> >> callback, and the completion test is clearly spelled-out int the 
> >> recv_request_pml_complete_check function (of course minus the lock part). 
> >> Rolf I think your patch is correct.
> >> 
> >> That being said req_bytes_expected is a special value, one that should 
> >> only be used to check from truncation. Otherwise, req_bytes_packed is the 
> >> value we should compare against.
> >> 
> >>  George.
> >> 
> >> On Jun 21, 2013, at 17:40 , Nathan Hjelm <hje...@lanl.gov> wrote:
> >> 
> >>> I thought I fixed this problem awhile back (though looking at the code 
> >>> its possible I never committed the fix). I will have to look through my 
> >>> local repository and see what happened to that fix. Your fix might not 
> >>> work correctly since a RGET can be broken up into multiple get 
> >>> operations. It may work, I would just need to test it to make sure.
> >>> 
> >>> -Nathan
> >>> 
> >>> On Fri, Jun 21, 2013 at 08:25:29AM -0700, Rolf vandeVaart wrote:
> >>>> I ran into a hang in a test in which the sender sends less data than the 
> >>>> receiver is expecting.  For example, the following shows the receiver 
> >>>> expecting twice what the sender is sending.
> >>>> 
> >>>> Rank 0:  MPI_Send(buf, BUFSIZE, MPI_INT, 1, 99, MPI_COMM_WORLD)
> >>>> Rank 1: MPI_Recv(buf, BUFSIZE*2,  MPI_INT, 0, 99, MPI_COMM_WORLD)
> >>>> 
> >>>> This is also reproducible using one of the intel tests and adjusting the 
> >>>> eager value for the openib BTL.
> >>>> 
> >>>> ?  mpirun -np 2 -host frick,frack -mca btl_openib_eager_limit 56 
> >>>> MPI_Send_overtake_c
> >>>> 
> >>>> In most cases, this works just fine.  However, when the PML protocol 
> >>>> used is the RGET protocol, the test hangs.   Below is a proposed fix for 
> >>>> this issue.
> >>>> I believe we want to be checking against req_bytes_packed rather than 
> >>>> req_bytes_expected as req_bytes_expected is what the user originally 
> >>>> told us.
> >>>> Otherwise, with the current code, we never send a FIN message back to 
> >>>> the sender.
> >>>> 
> >>>> Any thoughts?
> >>>> 
> >>>> [rvandevaart@sm065 ompi-trunk]$ svn diff 
> >>>> ompi/mca/pml/ob1/pml_ob1_recvreq.c
> >>>> Index: ompi/mca/pml/ob1/pml_ob1_recvreq.c
> >>>> ===================================================================
> >>>> --- ompi/mca/pml/ob1/pml_ob1_recvreq.c        (revision 28633)
> >>>> +++ ompi/mca/pml/ob1/pml_ob1_recvreq.c     (working copy)
> >>>> @@ -335,7 +335,7 @@
> >>>>    /* is receive request complete */
> >>>>    OPAL_THREAD_ADD_SIZE_T(&recvreq->req_bytes_received, 
> >>>> frag->rdma_length);
> >>>> -    if (recvreq->req_bytes_expected <= recvreq->req_bytes_received) {
> >>>> +    if (recvreq->req_recv.req_bytes_packed <= 
> >>>> recvreq->req_bytes_received) {
> >>>>        mca_pml_ob1_send_fin(recvreq->req_recv.req_base.req_proc,
> >>>>                              bml_btl,
> >>>>                                     frag->rdma_hdr.hdr_rget.hdr_des,
> >>>> 
> >>>> 
> >>>> 
> >>>> -----------------------------------------------------------------------------------
> >>>> This email message is for the sole use of the intended recipient(s) and 
> >>>> may contain
> >>>> confidential information.  Any unauthorized review, use, disclosure or 
> >>>> distribution
> >>>> is prohibited.  If you are not the intended recipient, please contact 
> >>>> the sender by
> >>>> reply email and destroy all copies of the original message.
> >>>> -----------------------------------------------------------------------------------
> >>> 
> >>>> _______________________________________________
> >>>> devel mailing list
> >>>> de...@open-mpi.org
> >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>> 
> >>> _______________________________________________
> >>> devel mailing list
> >>> de...@open-mpi.org
> >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >> 
> >> 
> >> _______________________________________________
> >> devel mailing list
> >> de...@open-mpi.org
> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > _______________________________________________
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Reply via email to