I prefer the original patch, the reason is: drain_list only have one
element, which the first of of the flush list, whenever its drain WR
processed, it will be put in reap_list, why we need drain_list here?

Thanks
Shirley



                                                                       
             Roland                                                    
             Dreier                                                    
             <rdreier@                                                  To
             cisco.com         <[EMAIL PROTECTED]>                    
             >                                                          cc
                               Shirley Ma/Beaverton/[EMAIL PROTECTED],         
             06/25/08          <[EMAIL PROTECTED]>,
             11:45 AM          "Olga Stern" <[EMAIL PROTECTED]>,      
                               "OpenFabrics General"                   
                               <[email protected]>         
                                                                   Subject
                               Re: [ofa-general] Re: [RFC][PATCH] last wqe
                               event handler patch                     
                                                                       
                                                                       
                                                                       
                                                                       
                                                                       
                                                                       




So I think I'll queue this up -- does this make sense to everyone?

commit af806651d0042ca014eef62832c7e7083013b954
Author: Roland Dreier <[EMAIL PROTECTED]>
Date:   Wed Jun 25 11:44:47 2008 -0700

    IPoIB/cm: Drain receive QPs one at a time

    The current handling of last WQE reached events is broken when
    multiple QPs are drained at once -- for example:

    1. Last WQE reached event for QP100: QP100 context is added into
       flush_list, and then it is put into drain_list, and does post_send
       of a drain WR.

    2. Last WQE reached event for QP200: QP200 context is added into
       flush_list, but not drain_list since only one drain WR will be
       posted.

    3. Last WQE reached event for QP300: QP300 context is added into
       flush_list, but not drain_list since only one drain WR will be
       posted.

    So at this point, QP100 is on drain_list and QP200 and QP300 are on
    flush_list.

    4. QP100 drain WR CQE is handled: put QP100 into reap_list then call
       ipoib_cm_start_rx_drain(), which does post_send of QP200 drain WR,
       and moves both QP200 and QP300 from flush_list to drain_list.

    5. QP200 drain WR CQE is polled, which will move both QP200 and QP300
       from drain_list to reap_list.

    6. QP200 and QP300 are both freed from reap_list.

    7. A CQE for QP300 is seen, but QP300 context has been freed --->
crash.

    The fundamental problem is that a last WQE reached event does not mean
    all pending completions have been queued.

    Fix this by moving QPs from the flush_list to the drain_list one at a
    time, so that we never free QPs until we really know that we are done
    with them.

    Debugged-by: Shirley Ma <[EMAIL PROTECTED]>
    Fix-suggested-by: Eli Cohen <[EMAIL PROTECTED]>
    Signed-off-by: Roland Dreier <[EMAIL PROTECTED]>

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
index 6223fc3..793f2bd 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
@@ -222,7 +222,7 @@ static void ipoib_cm_start_rx_drain(struct
ipoib_dev_priv *priv)
             if (ib_post_send(p->qp, &ipoib_cm_rx_drain_wr, &bad_wr))
                         ipoib_warn(priv, "failed to post drain wr\n");

-            list_splice_init(&priv->cm.rx_flush_list,
&priv->cm.rx_drain_list);
+            list_move(&p->list, &priv->cm.rx_drain_list);
 }

 static void ipoib_cm_rx_event_handler(struct ib_event *event, void *ctx)

<<inline: graycol.gif>>

<<inline: pic21829.gif>>

<<inline: ecblank.gif>>

_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to