Re: [ofa-general] Re: [RFC][PATCH] last wqe event handler patch

Shirley Ma Wed, 25 Jun 2008 15:09:23 -0700



Roland Dreier <[EMAIL PROTECTED]> wrote on 06/25/2008 02:43:19 PM:

> Interesting... I wonder if it really is taking that long for everything
> to finish draining, or if the system is too busy so it sees a spurious
> timeout?  The intention of all of this is that it should "never happen"
> unless the hardware really is stuck.

I guess the reason might be we have a large cluster, each node has 4 ports,
too many RC QPs in this set up. We saw QPs went to dead and 5 secs drain
didn't work.

> What exactly is causing the crash here?

You can ignore this for now, it's related to other patch not current code
level. I will explain it in drain WR post_send failure patch.

Please review the stale connection resource cleanup patch to see whether it
makes sense.

thanks
Shirley

_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [ofa-general] Re: [RFC][PATCH] last wqe event handler patch

Reply via email to