Roland Dreier <[EMAIL PROTECTED]> wrote on 06/25/2008 02:43:19 PM:
> Interesting... I wonder if it really is taking that long for everything
> to finish draining, or if the system is too busy so it sees a spurious
> timeout? The intention of all of this is that it should "never happen"
> unless the hardware really is stuck.
I guess the reason might be we have a large cluster, each node has 4 ports,
too many RC QPs in this set up. We saw QPs went to dead and 5 secs drain
didn't work.
> What exactly is causing the crash here?
You can ignore this for now, it's related to other patch not current code
level. I will explain it in drain WR post_send failure patch.
Please review the stale connection resource cleanup patch to see whether it
makes sense.
thanks
Shirley
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general