Hi, On Thu, Jun 19, 2008 at 11:24:09AM -0400, Greg Haase wrote: > Attached, please find an hb_report created for this particular setup for > the timeframe when the issue occurred. > > I realize that we're not supposed to sanitize these because it could > obfuscate important information, but I've had to go through and sed > replace a bunch of stuff for security reasons. I hope I didn't destroy > anything useful to troubleshooting.
No problem. There's nothing particularly interesting in the logs apart from what you already reported. I still believe that this is a performance problem. Did you notice that mysql is using a bit more than 6G of memory: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 20173 mysql 15 0 6884m 6.1g 5596 S 28 78.8 3770:47 mysqld According to the CPU time column it is also an extraordinarily busy chap (see ps(1) and compare this time to the total time since the database started, which you can find in ha-log). You have to investigate the performance and collect statistics (sysstat, sar) and see how to relieve the database which seems to be both CPU and memory bound. Perhaps to turn to mysql forums/support. A few notes on your config: - You don't have stonith. And you have shared storage. That's very very dangerous. Indeed. - You have monitor ops defined for all resources, but not for the main one, i.e. the one which is actually offering a usable service. You could remove all and just monitor mysql and pingd (and for pingd it's enough to do that once every say 5 minutes). - On failover, stopping mysql took close to a minute, and the timeout for the stop operation is set to two minutes. Perhaps increase this timeout. Failed stop operations are rather difficult to recover from and since you don't have stonith, such a failure would basically bring your database to a halt. Good luck. Dejan > Also, I noticed that I almost _always_ get one of these G_SIG_dispatch > delays in the logs at the time when the daily report information is > output. > > > > On Tue, 2008-06-17 at 14:29 -0400, Greg Haase wrote: > > Last week I emailed the list regarding a node failover that occurred > > when IPAddr monitor timed out. At the same time, my log was showing > > G_SIG_dispatch delays in lrmd. The thread ended in a petty discussion > > over what was the proper time out value (although all the examples on > > linux-ha.org show 3s here, it was suggested that I bump mine from 5s to > > 15s). > > > > A few minutes ago, I experienced another failover, this one due to drbd > > monitor failure. None of my other logs show any kind of disk error. In > > fact, my MySQL error log (located on the very drbd disk that failed) > > shows the shutdown messages subsequently issued by heartbeat. Again, the > > monitor failure occurs at the same time that a G_SIG_display delay > > occurs. > > > > Now does anyone have any idea why these errors may be occurring and is > > there a way to resolve them. > > > > Please see attached log snippet. > > > > > > _______________________________________________ > > Linux-HA mailing list > > [email protected] > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > See also: http://linux-ha.org/ReportingProblems > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
