This sounds like a memory problem from the mail app or OS that runs into the cluster software. Trace running memory heaps in the dump.
On Fri, Oct 30, 2009 at 6:27 PM, Allen Belletti <[email protected]>wrote: > Hi All, > > As I've mentioned before, I'm running a two-node clustered mail server on > GFS2 (with RHEL 5.4) Nearly all of the time, everything works great. > However, going all the way back to GFS1 on RHEL 5.1 (I think it was), I've > had occasional locking problems that force a reboot of one or both cluster > nodes. Lately I've paid closer attention since it's been happening more > often. > > I'll notice the problem when the load average starts rising. It's always > tied to "stuck" processes, and I believe always tied to IMAP clients (I'm > running Dovecot.) It seems like a file belonging to user "x" (in this case, > "jforrest" will become locked in some way, such that every IMAP process tied > that user will get stuck on the same thing. Over time, as the user keeps > trying to read that file, more & more processes accumulate. They're always > in state "D" (uninterruptible sleep), and always on "dlm_posix_lock" > according to WCHAN. The only way I'm able to get out of this state is to > reboot. If I let it persist for too long, I/O generally stops entirely. > > This certainly seems like it ought to have a definite solution, but I've no > idea what it is. I've tried a variety of things using "find" to pinpoint a > particular file, but everything belonging to the affected user seems just > fine. At least, I can read and copy all of the files, and do a stat via ls > -l. > > Is it possible that this is a bug, not within GFS at all, but within > Dovecot IMAP? > > Any thoughts would be appreciated. It's been getting worse lately and thus > no fun at all. > > Cheers, > Allen > > -- > Linux-cluster mailing list > [email protected] > https://www.redhat.com/mailman/listinfo/linux-cluster >
-- Linux-cluster mailing list [email protected] https://www.redhat.com/mailman/listinfo/linux-cluster
