Still failing in our test as well Amit
-----Original Message----- From: Scott Weitzenkamp (sweitzen) [mailto:[EMAIL PROTECTED] Sent: Saturday, May 12, 2007 1:32 AM To: Michael S. Tsirkin; Scott Weitzenkamp (sweitzen) Cc: Yohad Dickman; Amit Krig; Tziporet Koren; Michael S. Tsirkin; [email protected]; Roland Dreier Subject: RE: [PATCH] ipoib/cm: make stale task actually run once in a while (DOES NOT HELP) Importance: High This patch, which is in OFED-1.2-20070511-0600, does NOT help. I am still seeing 105-second port failover times. Amit, did you try it? Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems > -----Original Message----- > From: Michael S. Tsirkin [mailto:[EMAIL PROTECTED] > Sent: Monday, May 07, 2007 1:03 PM > To: Scott Weitzenkamp (sweitzen) > Cc: Yohad Dickman; Amit Krig; Tziporet Koren; [EMAIL PROTECTED]; > [email protected]; Roland Dreier > Subject: [PATCH] ipoib/cm: make stale task actually run once in a > while > > In the presence of some active passive connections, stale task would > never run, since each 4 RX CQEs we repeat queue_delayed_work calls > which delays it for some 10 minutes. As a result, on a noisy system > with failing ports, we slowly run out of resources - slowing > connection setup down and eventually failing. > > What we actually want to do is - start stale task when a first passive > connection is added, rerun it every 10 min as long as there are > outstanding passive connections. > > As a happy side effect, this removes some code from RX data path. > > Signed-off-by: Michael S. Tsirkin <[EMAIL PROTECTED]> > > --- > > Scott, I think this might address bugs 541 and 465: slow IPoIB CM HA > failover and eventual failing IPoIB HA. Could you test this please? > > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c > b/drivers/infiniband/ulp/ipoib/ipoib_cm.c > index 2b242a4..b77e8d7 100644 > --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c > +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c > @@ -258,10 +258,11 @@ static int ipoib_cm_req_handler(struct ib_cm_id > *cm_id, struct ib_cm_event *even > cm_id->context = p; > p->jiffies = jiffies; > spin_lock_irqsave(&priv->lock, flags); > + if (list_empty(&priv->cm.passive_ids)) > + queue_delayed_work(ipoib_workqueue, > + &priv->cm.stale_task, > IPOIB_CM_RX_DELAY); > list_add(&p->list, &priv->cm.passive_ids); > spin_unlock_irqrestore(&priv->lock, flags); > - queue_delayed_work(ipoib_workqueue, > - &priv->cm.stale_task, IPOIB_CM_RX_DELAY); > return 0; > > err_rep: > @@ -380,8 +381,6 @@ void ipoib_cm_handle_rx_wc(struct net_device *dev, > struct ib_wc *wc) > if (!list_empty(&p->list)) > list_move(&p->list, > &priv->cm.passive_ids); > spin_unlock_irqrestore(&priv->lock, flags); > - queue_delayed_work(ipoib_workqueue, > - > &priv->cm.stale_task, IPOIB_CM_RX_DELAY); > } > } > > @@ -1104,6 +1103,10 @@ static void ipoib_cm_stale_task(struct > work_struct *work) > kfree(p); > spin_lock_irqsave(&priv->lock, flags); > } > + > + if (!list_empty(&priv->cm.passive_ids)) > + queue_delayed_work(ipoib_workqueue, > + &priv->cm.stale_task, > IPOIB_CM_RX_DELAY); > spin_unlock_irqrestore(&priv->lock, flags); } > > -- > MST > _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
