I thought this might be worth mentioning for those that come in the future...

Recently I had had enough of my cluster loosing log messages and decided to investigate because CTS fails if the "wrong" messing goes missing.

Here's the sort of thing I was seeing on my log host:

c001n16:~ # grep "STATS: dropped" /var/log/messages
Jun 26 12:27:45 c001n16 syslog-ng[24834]: STATS: dropped 472
Jun 26 13:27:45 c001n16 syslog-ng[24834]: STATS: dropped 369
Jun 26 14:27:45 c001n16 syslog-ng[24834]: STATS: dropped 357
Jun 26 15:27:46 c001n16 syslog-ng[24834]: STATS: dropped 581
Jun 26 16:27:46 c001n16 syslog-ng[24834]: STATS: dropped 1301
Jun 26 17:27:46 c001n16 syslog-ng[24834]: STATS: dropped 156
Jun 26 18:27:46 c001n16 syslog-ng[24834]: STATS: dropped 362
Jun 26 19:27:46 c001n16 syslog-ng[24834]: STATS: dropped 184
Jun 26 20:27:46 c001n16 syslog-ng[24834]: STATS: dropped 1436
Jun 26 21:27:46 c001n16 syslog-ng[24834]: STATS: dropped 241
Jun 26 22:27:46 c001n16 syslog-ng[24834]: STATS: dropped 331
Jun 26 23:27:46 c001n16 syslog-ng[24834]: STATS: dropped 104
Jun 27 00:27:46 c001n16 syslog-ng[24834]: STATS: dropped 296

A non-trivial amount of lost messages :-(
Now keep in mind that CTS keeps the cluster in a constant state of upheaval and that this was an 8-node cluster.

Basically, the logs above indicate that (at least) one of the syslog- ng destination can't keep up with the flow of logs from the cluster.


The first thing to do is make sure you're using TCP connections instead of UDP (Note that this just ensures the messages aren't lost _before_ they get to the log host).

Then disable targets that are known to be slow.
Look for and disable (in syslog-ng.conf) anything with console in the name.

Then set a few options to reduce the amount of work syslog-ng needs to do for each remote message.
Look for (in syslog-ng.conf) "options { " and add:
long_hostnames(off); check_hostname(no); dns_cache(yes); dns_cache_size(100); keep_hostname(yes); chain_hostnames(no); Basically this tells syslog-ng to trust or at least cache the hostnames that were sent with the message.

Lastly, increase the output queue. When the queue is full, thats when messages start getting thrown out (the default is 100).
Look for (in syslog-ng.conf) "options { " and add:
   log_fifo_size(4096);

The full set of options I'm using now is:
options { long_hostnames(off); sync(0); perm(0640); stats(3600); check_hostname(no); dns_cache(yes); dns_cache_size(100); log_fifo_size(4096); keep_hostname(yes); chain_hostnames(no); };

And since I've done that, the same grep shows:

c001n16:~ # grep "STATS: dropped" /var/log/messages
Jun 27 09:58:24 c001n16 syslog-ng[24834]: STATS: dropped 0
Jun 27 10:58:24 c001n16 syslog-ng[24834]: STATS: dropped 0
Jun 27 11:58:25 c001n16 syslog-ng[24834]: STATS: dropped 0
Jun 27 12:58:25 c001n16 syslog-ng[24834]: STATS: dropped 0
Jun 27 13:58:25 c001n16 syslog-ng[24834]: STATS: dropped 0
Jun 27 14:58:25 c001n16 syslog-ng[24834]: STATS: dropped 0


Hope this helps,
Andrew
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to