I thought I fixed this problem awhile back (though looking at the code its possible I never committed the fix). I will have to look through my local repository and see what happened to that fix. Your fix might not work correctly since a RGET can be broken up into multiple get operations. It may work, I would just need to test it to make sure.
-Nathan On Fri, Jun 21, 2013 at 08:25:29AM -0700, Rolf vandeVaart wrote: > I ran into a hang in a test in which the sender sends less data than the > receiver is expecting. For example, the following shows the receiver > expecting twice what the sender is sending. > > Rank 0: MPI_Send(buf, BUFSIZE, MPI_INT, 1, 99, MPI_COMM_WORLD) > Rank 1: MPI_Recv(buf, BUFSIZE*2, MPI_INT, 0, 99, MPI_COMM_WORLD) > > This is also reproducible using one of the intel tests and adjusting the > eager value for the openib BTL. > > ? mpirun -np 2 -host frick,frack -mca btl_openib_eager_limit 56 > MPI_Send_overtake_c > > In most cases, this works just fine. However, when the PML protocol used is > the RGET protocol, the test hangs. Below is a proposed fix for this issue. > I believe we want to be checking against req_bytes_packed rather than > req_bytes_expected as req_bytes_expected is what the user originally told us. > Otherwise, with the current code, we never send a FIN message back to the > sender. > > Any thoughts? > > [rvandevaart@sm065 ompi-trunk]$ svn diff ompi/mca/pml/ob1/pml_ob1_recvreq.c > Index: ompi/mca/pml/ob1/pml_ob1_recvreq.c > =================================================================== > --- ompi/mca/pml/ob1/pml_ob1_recvreq.c (revision 28633) > +++ ompi/mca/pml/ob1/pml_ob1_recvreq.c (working copy) > @@ -335,7 +335,7 @@ > /* is receive request complete */ > OPAL_THREAD_ADD_SIZE_T(&recvreq->req_bytes_received, frag->rdma_length); > - if (recvreq->req_bytes_expected <= recvreq->req_bytes_received) { > + if (recvreq->req_recv.req_bytes_packed <= recvreq->req_bytes_received) { > mca_pml_ob1_send_fin(recvreq->req_recv.req_base.req_proc, > bml_btl, > frag->rdma_hdr.hdr_rget.hdr_des, > > > > ----------------------------------------------------------------------------------- > This email message is for the sole use of the intended recipient(s) and may > contain > confidential information. Any unauthorized review, use, disclosure or > distribution > is prohibited. If you are not the intended recipient, please contact the > sender by > reply email and destroy all copies of the original message. > ----------------------------------------------------------------------------------- > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel