Re: [Linux-HA] More monitor failures near G_SIG_dispatch delays

Dejan Muhamedagic Fri, 20 Jun 2008 03:42:08 -0700

Hi,

On Thu, Jun 19, 2008 at 11:24:09AM -0400, Greg Haase wrote:
> Attached, please find an hb_report created for this particular setup for
> the timeframe when the issue occurred.
> 
> I realize that we're not supposed to sanitize these because it could
> obfuscate important information, but I've had to go through and sed
> replace a bunch of stuff for security reasons. I hope I didn't destroy
> anything useful to troubleshooting.


No problem.

There's nothing particularly interesting in the logs apart from
what you already reported. I still believe that this is a
performance problem. Did you notice that mysql is using a bit
more than 6G of memory:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+ COMMAND            
20173 mysql     15   0 6884m 6.1g 5596 S   28 78.8   3770:47 mysqld             

According to the CPU time column it is also an extraordinarily
busy chap (see ps(1) and compare this time to the total time
since the database started, which you can find in ha-log).

You have to investigate the performance and collect statistics
(sysstat, sar) and see how to relieve the database which seems to
be both CPU and memory bound. Perhaps to turn to mysql
forums/support.

A few notes on your config:

- You don't have stonith. And you have shared storage. That's
  very very dangerous. Indeed.

- You have monitor ops defined for all resources, but not for the
  main one, i.e. the one which is actually offering a usable
  service. You could remove all and just monitor mysql and pingd
  (and for pingd it's enough to do that once every say 5
  minutes).

- On failover, stopping mysql took close to a minute, and the
  timeout for the stop operation is set to two minutes. Perhaps
  increase this timeout. Failed stop operations are rather
  difficult to recover from and since you don't have stonith,
  such a failure would basically bring your database to a halt.

Good luck.

Dejan


> Also, I noticed that I almost _always_ get one of these G_SIG_dispatch
> delays in the logs at the time when the daily report information is
> output.
> 
> 
> 
> On Tue, 2008-06-17 at 14:29 -0400, Greg Haase wrote:
> > Last week I emailed the list regarding a node failover that occurred
> > when IPAddr monitor timed out. At the same time, my log was showing
> > G_SIG_dispatch delays in lrmd.  The thread ended in a petty discussion
> > over what was the proper time out value (although all the examples on
> > linux-ha.org show 3s here, it was suggested that I bump mine from 5s to
> > 15s).
> > 
> > A few minutes ago, I experienced another failover, this one due to drbd
> > monitor failure. None of my other logs show any kind of disk error. In
> > fact, my MySQL error log (located on the very drbd disk that failed)
> > shows the shutdown messages subsequently issued by heartbeat. Again, the
> > monitor failure occurs at the same time that a G_SIG_display delay
> > occurs.
> > 
> > Now does anyone have any idea why these errors may be occurring and is
> > there a way to resolve them.
> > 
> > Please see attached log snippet.
> > 
> > 
> > _______________________________________________
> > Linux-HA mailing list
> > [email protected]
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems


> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] More monitor failures near G_SIG_dispatch delays

Reply via email to