George, thanks for taking a look. Your patch looks good and I can confirm it 
fixes the hang I am seeing on our XE6.

-Nathan

On Thu, 3 May 2012, George Bosilca wrote:

Nathan,

You're right, when we loop trying to restart a failed request we must reset the 
convertor. However:
1. the position in this case is always zero, so we don't have to save the 
previous position in order to restore it.
2. all cases must be protected, not only the 
mca_pml_ob1_send_request_start_prepare.

Therefore there are 2 ways of doing this without affecting performance. We 
remove the initialization of the convertor in the send_init function, and force 
the convertor position to zero on every call to 
mca_pml_ob1_send_request_start_btl. Or, we reset the convertor position to zero 
every time we withdraw a failed request from the pending queue in order to 
restart it (in mca_pml_ob1_progress and in 
mca_pml_ob1_send_request_process_pending).

However, with the multiplication of send strategies in the OB1 (CUDA as an 
example) someone else should take a look in that code to apply the same logic.

Please find below a patch implementing the second method.

 george.


Index: ompi/mca/pml/ob1/pml_ob1_progress.c
===================================================================
--- ompi/mca/pml/ob1/pml_ob1_progress.c (revision 26381)
+++ ompi/mca/pml/ob1/pml_ob1_progress.c (working copy)
@@ -53,6 +53,7 @@
            completed_requests++;
            break;
        case MCA_PML_OB1_SEND_PENDING_START:
+            MCA_PML_OB1_SEND_REQUEST_RESET(sendreq);
            endpoint = sendreq->req_endpoint;
            send_succedded = false;
            for(j = 0; j < 
(int)mca_bml_base_btl_array_get_size(&endpoint->btl_eager); j++) {
Index: ompi/mca/pml/ob1/pml_ob1_sendreq.c
===================================================================
--- ompi/mca/pml/ob1/pml_ob1_sendreq.c  (revision 26381)
+++ ompi/mca/pml/ob1/pml_ob1_sendreq.c  (working copy)
@@ -70,7 +70,8 @@
                add_request_to_send_pending(sendreq,
                        MCA_PML_OB1_SEND_PENDING_START, true);
            } else {
-                rc = mca_pml_ob1_send_request_start_btl(sendreq, send_dst);
+                 MCA_PML_OB1_SEND_REQUEST_RESET(sendreq);
+                 rc = mca_pml_ob1_send_request_start_btl(sendreq, send_dst);
                if (OMPI_ERR_OUT_OF_RESOURCE == rc) {
                    /* No more resources on this btl so prepend to the pending
                     * list to minimize reordering and give up for now. */
Index: ompi/mca/pml/ob1/pml_ob1_sendreq.h
===================================================================
--- ompi/mca/pml/ob1/pml_ob1_sendreq.h  (revision 26381)
+++ ompi/mca/pml/ob1/pml_ob1_sendreq.h  (working copy)
@@ -159,6 +159,13 @@
        (sendreq)->req_recv.pval = NULL;                                \
    }

+#define MCA_PML_OB1_SEND_REQUEST_RESET(sendreq)                             \
+{                                                                           \
+    size_t _position = 0;                                                   \
+    opal_convertor_set_position(&sendreq->req_send.req_base.req_convertor,  \
+                                &_position);                                \
+    assert( 0 == _position );                                               \
+}

static inline void mca_pml_ob1_free_rdma_resources(mca_pml_ob1_send_request_t* 
sendreq)
{


On May 1, 2012, at 10:55 , Hjelm, Nathan T wrote:

Ran across a problem in a failure path of start_prepare in ob1. If prepare_src 
succeed but send fails the send request convertor needs to be rolled back to 
the correct position. Can someone with more knowledge of ob1 check if this is 
indeed an error. Patch is below.

-Nathan

diff --git a/ompi/mca/pml/ob1/pml_ob1_sendreq.c 
b/ompi/mca/pml/ob1/pml_ob1_sendreq.c
index 2a8ac03..5505918 100644
--- a/ompi/mca/pml/ob1/pml_ob1_sendreq.c
+++ b/ompi/mca/pml/ob1/pml_ob1_sendreq.c
@@ -570,6 +570,7 @@ int mca_pml_ob1_send_request_start_prepare( 
mca_pml_ob1_send_request_t* sendreq,
                                            mca_bml_base_btl_t* bml_btl,
                                            size_t size )
{
+    size_t old_position = sendreq->req_send.req_base.req_convertor.bConverted;
    mca_btl_base_descriptor_t* des;
    mca_btl_base_segment_t* segment;
    mca_pml_ob1_hdr_t* hdr;
@@ -614,6 +615,9 @@ int mca_pml_ob1_send_request_start_prepare( 
mca_pml_ob1_send_request_t* sendreq,
        return OMPI_SUCCESS;
    }
    mca_bml_base_free(bml_btl, des );
+
+    opal_convertor_set_position(&sendreq->req_send.req_base.req_convertor,
+                                &old_position);
    return rc;
}


_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Reply via email to