Brian, Assuming all processes are doing the same code as below, I think the user program is incorrect and you were just getting lucky with the other implementations.
Specifically, there’s nothing stopping the rsend from a process to reach the other process before it posted the corresponding recv. For example, it might still be in the second wait all from the previous iteration. — Pavan Sent from my iPhone On Nov 7, 2018, at 12:09 PM, Smith, Brian E. via mpi-forum <mpi-forum@lists.mpi-forum.org<mailto:mpi-forum@lists.mpi-forum.org>> wrote: Hi all, (trying again; I thought this address was subscribed to the list but maybe not. Sorry if this is a duplicate) I have a user-provided code that uses persistent ready sends. (Don’t ask. I don’t have an answer to “why?”. Maybe it actually helped on some machine sometime in the past?) Anyway, the app fails on Titan fairly consistently (95+% failure) but works on most other platforms (BGQ, Summit, generic OMPI cluster, generic Intel MPI cluster). Note – I haven’t tried as many times on the other platforms as on Titan so maybe it might fail on one of them occasionally. I saw zero failures in my testing however. The code is basically this: MPI_Recv_init() MPI_Rsend_init() While(condition) { MPI_Start(recv_request) MPI_Start(rsend_request) MPI_Waitall(both requests) Twiddle_sendbuf_slightly(); } MPI_Request_free(recv_request) MPI_Request_free(rsend_request) MPI_Cart_shift(rotate source/dest different direction now) MPI_Recv_init() // sending the other direction now, basically MPI_Rsend_init() While(condition) { MPI_Start(recv_request) MPI_Start(rsend_request) MPI_Waitall(both requests) Twiddle_sendbuf_slightly(); } MPI_Request_free(recv_request) MPI_Request_free(rsend_request) Is this considered a “correct program”? There’s only a couple paragraphs on persistent sends in 800+ pages of standard, and not much more for nonblocking ready sends (which is essentially what this becomes). It’s pretty vague territory. I tried splitting the Waitall() into 2 Wait()s, explicitly waiting on the Recv request first, then the Rsend request. However, this still fails and suggests the requests are not happening in order: Rank 2 [Wed Nov 7 08:26:12 2018] [c5-0c0s3n1] Fatal error in PMPI_Wait: Other MPI error, error stack: PMPI_Wait(207).....................: MPI_Wait(request=0x7fffffff5698, status=0x7fffffff5630) failed MPIR_Wait_impl(100)................: MPIDI_CH3_PktHandler_ReadySend(829): Ready send from source 1 and with tag 1 had no matching receive It strongly looks like the send is not always posted before the receive, or at least the waitall completes the send sometimes before the recv. I suspect that means an implementation bug. Cray might actually be doing something for optimizing either persistent communications or ready sends (or both) that we never did on BGQ (so it’s not necessarily an MPICH vs OMPI difference at least) Thoughts? I’ll open a bug with them at some point but wanted to verify semantics first. Thanks Brian Smith Oak Ridge Leadership Computing Facility _______________________________________________ mpi-forum mailing list mpi-forum@lists.mpi-forum.org<mailto:mpi-forum@lists.mpi-forum.org> https://lists.mpi-forum.org/mailman/listinfo/mpi-forum
_______________________________________________ mpi-forum mailing list mpi-forum@lists.mpi-forum.org https://lists.mpi-forum.org/mailman/listinfo/mpi-forum