Re: [Linux-HA] Antw: Re: Massive amount of log messages after node failure

Dejan Muhamedagic Wed, 18 May 2011 07:48:18 -0700

Hi,

On Wed, May 18, 2011 at 09:03:29AM +0200, Ulrich Windl wrote:
> >>> Lars Marowsky-Bree <[email protected]> schrieb am 17.05.2011 um 22:39 in 
> >>> Nachricht
> <[email protected]>:
> > On 2011-05-17T17:16:51, Ulrich Windl <[email protected]> 
> > wrote:
> > 
> > > I think that pacemaker is logging too much all the time, so you hardly 
> > > can 
> > find out if there really is a problem. For example external/sbd is logging 
> > a 
> > message every time the shared disk is OK, that is every 30s or so.
> > 
> > It should not - the external/sbd status code path doesn't have any log
> > messages? What do you see?
> 
> Apr 28 17:10:11 host2 stonith: [7890]: info: external/sbd device OK.
> Apr 28 17:10:42 host2 stonith: [7951]: info: external/sbd device OK.
> Apr 28 17:11:13 host2 stonith: [8007]: info: external/sbd device OK.
> Apr 28 17:11:44 host2 stonith: [8063]: info: external/sbd device OK.


This happens because the plugin is invoked via stonith(8) (the
program) and stonith does the logging. I did notice that before,
but didn't remove the message because the best practice for
monitoring fencing devices is to do that every once in a while
(say every few hours).

Thanks,

Dejan

> > In general, turning down logging is something that we do, but with care
> > - disk space is cheap, missing the information to diagnose a problem
> > after the first failure and needing to recreate it is not. I'd rather
> > err on the conservative side. If you're looking for important bits,
> > filtering for warn/crit/err/emerg should do.
> > 
> > Syslog has the advantage of seeing all messages in context, an
> > incredibly valuable aspect.
> 
> In some older software I wrote I did collect debug messages in a separate 
> file, and when no errors occurred the file was deleted. In case of an error 
> the file was mailed together with the error message. That kind of approach 
> makes much more sense than creating megabytes of messages that nobody cares 
> about.
> 
> > 
> > > And of course, I wouldn't complain if I hadn't done it better long time 
> > ago:
> > 
> > Ah, so you're offering patches! ;-) Excellent, we look forward to
> > reviewing them - please post them on the respective development mailing
> > lists.
> 
> Once I have set up our development system I will definitely have a look at 
> all the stuff, but I'll be quite busy with more important tasks in the next 
> weeks or months.
> 
> > 
> > > Seeing pacemaker logs, I feel the programmers just left their personal 
> > debugging messages in there which nobody really understands. An example:
> > 
> > Of course. Some of the messages are intended to be read by developers
> > when we try to diagnose customer/user problems. They get very anxious
> > when we can't. ;-)
> > 
> > Like I said, we're always tuning them down - you'll find that they are a
> > lot quieter nowadays than they were 2 years ago, and in theory, a
> > cluster that doesn't do anything won't log much. What you quoted was,
> > however, from an active transition - the cluster was actively doing
> > something anyway, and we'd rather be able to figure it out in
> > retrospect.
> 
> I have a cluster that just has an SBD device configured (it's abou to be 
> completed soon). It's producing a lot of messages all the time. I'm still 
> unsure whether there is a problem or not, but once I know better, I'll ask 
> again.
> 
> Regards,
> Ulrich
> 
> 
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Antw: Re: Massive amount of log messages after node failure

Reply via email to