What: Add an optimization for isend in pml/ob1. This optimization is a
companion to the optimizations recently committed to 1.8.1. The basic
idea is if the btl supports inline sends and the message is small
(currently hardcoded but planned to be configurable) then try to send it
with the inline send function. If this succeeds return
ompi_request_empty for the request. From what I can tell the only
requirement on a request returned by MPI_Isend is that the status
indicates whether the request was cancelled so this should be ok. 

Why: This optimization should improve small message rates when the btl
provides an inline send function. This commit may or may not help
application performance but it certainly gives better results with
osu_bibw and osu_mbw_mr.

When: This is targeted for 1.8.1 so I would like this to soak on the
trunk for a little while before being moved over. As I said, it is
conceptually the same as the other ob1 optimization. Setting the timeout
for two weeks (April 29).

The patch is attached. It has been tested with btl/ugni and btl/vader
and I have seen a 10-30% improvement in the small message rate.

-Nathan
diff --git a/ompi/mca/pml/ob1/pml_ob1_isend.c b/ompi/mca/pml/ob1/pml_ob1_isend.c
index 5914e26..8f82b9a 100644
--- a/ompi/mca/pml/ob1/pml_ob1_isend.c
+++ b/ompi/mca/pml/ob1/pml_ob1_isend.c
@@ -124,8 +124,27 @@ int mca_pml_ob1_isend(void *buf,
                       ompi_communicator_t * comm,
                       ompi_request_t ** request)
 {
-    int rc;
+    mca_pml_ob1_comm_t* ob1_comm = comm->c_pml_comm;
     mca_pml_ob1_send_request_t *sendreq = NULL;
+    ompi_proc_t *dst_proc = ompi_comm_peer_lookup (comm, dst);
+    mca_bml_base_endpoint_t* endpoint = (mca_bml_base_endpoint_t*)
+                                        
dst_proc->proc_endpoints[OMPI_PROC_ENDPOINT_TAG_BML];
+    int16_t seqn;
+    int rc;
+
+    seqn = (uint16_t) OPAL_THREAD_ADD32(&ob1_comm->procs[dst].send_sequence, 
1);
+
+    if (MCA_PML_BASE_SEND_SYNCHRONOUS != sendmode) {
+        rc = mca_pml_ob1_send_inline (buf, count, datatype, dst, tag, seqn, 
dst_proc,
+                                      endpoint, comm);
+        if (OPAL_LIKELY(0 <= rc)) {
+            /* NTH: it is legal to return ompi_request_empty since the only 
valid
+             * field in a send completion status is whether or not the send was
+             * cancelled (which it can't be at this point anyway). */
+            *request = &ompi_request_empty;
+            return OMPI_SUCCESS;
+        }
+    }
 
     MCA_PML_OB1_SEND_REQUEST_ALLOC(comm, dst, sendreq);
     if (NULL == sendreq)
@@ -142,7 +161,7 @@ int mca_pml_ob1_isend(void *buf,
                              &(sendreq)->req_send.req_base,
                              PERUSE_SEND);
 
-    MCA_PML_OB1_SEND_REQUEST_START(sendreq, rc);
+    MCA_PML_OB1_SEND_REQUEST_START_W_SEQ(sendreq, endpoint, seqn, rc);
     *request = (ompi_request_t *) sendreq;
     return rc;
 }

Attachment: pgpdyR9XTTFfD.pgp
Description: PGP signature

Reply via email to