What: Add an optimization for isend in pml/ob1. This optimization is a companion to the optimizations recently committed to 1.8.1. The basic idea is if the btl supports inline sends and the message is small (currently hardcoded but planned to be configurable) then try to send it with the inline send function. If this succeeds return ompi_request_empty for the request. From what I can tell the only requirement on a request returned by MPI_Isend is that the status indicates whether the request was cancelled so this should be ok.
Why: This optimization should improve small message rates when the btl provides an inline send function. This commit may or may not help application performance but it certainly gives better results with osu_bibw and osu_mbw_mr. When: This is targeted for 1.8.1 so I would like this to soak on the trunk for a little while before being moved over. As I said, it is conceptually the same as the other ob1 optimization. Setting the timeout for two weeks (April 29). The patch is attached. It has been tested with btl/ugni and btl/vader and I have seen a 10-30% improvement in the small message rate. -Nathan
diff --git a/ompi/mca/pml/ob1/pml_ob1_isend.c b/ompi/mca/pml/ob1/pml_ob1_isend.c index 5914e26..8f82b9a 100644 --- a/ompi/mca/pml/ob1/pml_ob1_isend.c +++ b/ompi/mca/pml/ob1/pml_ob1_isend.c @@ -124,8 +124,27 @@ int mca_pml_ob1_isend(void *buf, ompi_communicator_t * comm, ompi_request_t ** request) { - int rc; + mca_pml_ob1_comm_t* ob1_comm = comm->c_pml_comm; mca_pml_ob1_send_request_t *sendreq = NULL; + ompi_proc_t *dst_proc = ompi_comm_peer_lookup (comm, dst); + mca_bml_base_endpoint_t* endpoint = (mca_bml_base_endpoint_t*) + dst_proc->proc_endpoints[OMPI_PROC_ENDPOINT_TAG_BML]; + int16_t seqn; + int rc; + + seqn = (uint16_t) OPAL_THREAD_ADD32(&ob1_comm->procs[dst].send_sequence, 1); + + if (MCA_PML_BASE_SEND_SYNCHRONOUS != sendmode) { + rc = mca_pml_ob1_send_inline (buf, count, datatype, dst, tag, seqn, dst_proc, + endpoint, comm); + if (OPAL_LIKELY(0 <= rc)) { + /* NTH: it is legal to return ompi_request_empty since the only valid + * field in a send completion status is whether or not the send was + * cancelled (which it can't be at this point anyway). */ + *request = &ompi_request_empty; + return OMPI_SUCCESS; + } + } MCA_PML_OB1_SEND_REQUEST_ALLOC(comm, dst, sendreq); if (NULL == sendreq) @@ -142,7 +161,7 @@ int mca_pml_ob1_isend(void *buf, &(sendreq)->req_send.req_base, PERUSE_SEND); - MCA_PML_OB1_SEND_REQUEST_START(sendreq, rc); + MCA_PML_OB1_SEND_REQUEST_START_W_SEQ(sendreq, endpoint, seqn, rc); *request = (ompi_request_t *) sendreq; return rc; }
pgpdyR9XTTFfD.pgp
Description: PGP signature