I thought I fixed this problem awhile back (though looking at the code its 
possible I never committed the fix). I will have to look through my local 
repository and see what happened to that fix. Your fix might not work correctly 
since a RGET can be broken up into multiple get operations. It may work, I 
would just need to test it to make sure.

-Nathan

On Fri, Jun 21, 2013 at 08:25:29AM -0700, Rolf vandeVaart wrote:
> I ran into a hang in a test in which the sender sends less data than the 
> receiver is expecting.  For example, the following shows the receiver 
> expecting twice what the sender is sending.
> 
> Rank 0:  MPI_Send(buf, BUFSIZE, MPI_INT, 1, 99, MPI_COMM_WORLD)
> Rank 1: MPI_Recv(buf, BUFSIZE*2,  MPI_INT, 0, 99, MPI_COMM_WORLD)
> 
> This is also reproducible using one of the intel tests and adjusting the 
> eager value for the openib BTL.
> 
> ?  mpirun -np 2 -host frick,frack -mca btl_openib_eager_limit 56 
> MPI_Send_overtake_c
> 
> In most cases, this works just fine.  However, when the PML protocol used is 
> the RGET protocol, the test hangs.   Below is a proposed fix for this issue.
> I believe we want to be checking against req_bytes_packed rather than 
> req_bytes_expected as req_bytes_expected is what the user originally told us.
> Otherwise, with the current code, we never send a FIN message back to the 
> sender.
> 
> Any thoughts?
> 
> [rvandevaart@sm065 ompi-trunk]$ svn diff ompi/mca/pml/ob1/pml_ob1_recvreq.c
> Index: ompi/mca/pml/ob1/pml_ob1_recvreq.c
> ===================================================================
> --- ompi/mca/pml/ob1/pml_ob1_recvreq.c        (revision 28633)
> +++ ompi/mca/pml/ob1/pml_ob1_recvreq.c     (working copy)
> @@ -335,7 +335,7 @@
>      /* is receive request complete */
>      OPAL_THREAD_ADD_SIZE_T(&recvreq->req_bytes_received, frag->rdma_length);
> -    if (recvreq->req_bytes_expected <= recvreq->req_bytes_received) {
> +    if (recvreq->req_recv.req_bytes_packed <= recvreq->req_bytes_received) {
>          mca_pml_ob1_send_fin(recvreq->req_recv.req_base.req_proc,
>                                bml_btl,
>                                       frag->rdma_hdr.hdr_rget.hdr_des,
> 
> 
> 
> -----------------------------------------------------------------------------------
> This email message is for the sole use of the intended recipient(s) and may 
> contain
> confidential information.  Any unauthorized review, use, disclosure or 
> distribution
> is prohibited.  If you are not the intended recipient, please contact the 
> sender by
> reply email and destroy all copies of the original message.
> -----------------------------------------------------------------------------------

> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Reply via email to