Re: [Linux-cluster] Freeze with cluster-2.03.11

Wendy Cheng Sat, 28 Mar 2009 09:07:47 -0700

Kadlecsik Jozsef wrote:

I don't see a strong evidence of deadlock (but it could) from the thread
backtraces However, assuming the cluster worked before, you could have
overloaded the e1000 driver in this case. There are suspicious page faults
but memory is very "ok". So one possibility is that GFS had generated too
many sync requests that flooded the e1000. As the result, the cluster heart
beat missed its interval.

It's a possibility. But it assumes also that the node freezes >because<it was fenced off. So far nothing indicates that.

Re-read your console log. There are many foot-prints of spin_lock -that's worrisome. Hit a couple of "sysrq-w" next time when you havehangs, other than sysrq-t. This should give traces of the threads thatare actively on CPUs at that time. Also check your kernel change log (tosee whether GFS has any new patch that touches spin lock that doesn't inprevious release).

BTW, I do have opinions on other parts of your postings but don't havetime to express them now. Maybe I'll say something when I finish mycurrent chores :) ... Need to rush out now. Good luck on your debugging !


-- Wendy

--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

Re: [Linux-cluster] Freeze with cluster-2.03.11

Reply via email to