Hi again,

As an update, I was able to reproduce the error by writing a script that
issues 4096 HUP signals to snmptrapd. The process starts issuing the
"maximum conf file count (4096) exceeded" error, and forgets the configured
logging format.
I don't suppose this is a known bug which was fixed some time after
5.6.1.1? It looks like an issue with the signal handling - there is only
one config file, which is read multiple times. Seems like the config file
counter should be reset when SIGHUP is received.
I can't find the source for 5.6.1.1, but in the 5.6.2 code, I don't see
what the problem could be since "files" is in function-local scope. It
should be zeroed every time the read_config() function is called. I guess
there could be some scope confusion (or confusion on my part, I'm not a
habitual C programmer), or the problem was already fixed in 5.6.2...

I'm trying to reproduce the bug on 5.7.1 now, but unfortunately, since
snmptrapd takes about a second to restart on SIGHUP, the test takes just
over an hour to execure.

BR,
Joel Hansell




On Tue, Feb 4, 2014 at 10:58 AM, Joel Hansell <joel.hans...@gmail.com>wrote:

> Hi list,
>
> Here's one I've been scratching my head over lately.
>
> We have a bit of an oddball solution, where we've got snmptrapd logging a
> lot of traps into a file, which is parsed by an external tool. This is all
> running on a HP-UX 11.2 system, with Net-SNMP version 5.6.1.1 delivered
> with the "HP-UX Internet Express" package.
>
> A cron job runs "logrotate" every 15 minutes, and if the file is too big,
> it's rotated, and the postrotate script issues a SIGHUP to snmptrapd. That
> normally triggers the daemon to re-read its config and to restart the
> logging into a new file.
>
> The snmptrapf.config is set up to use a particular one-line trap logging
> format.
>
> It seems that every so often, the snmptrapd fails subtly on SIGHUP. It
> only seems to happen after more than a couple of months have passed. 60
> days, 91 days, 101 days, 113 days are some of the fault intervals.
>
> I've observed the following about the failure state after it happens:
> - Snmptrapd is executing and logging traps
> - The trap logging format has changed to the default trap logging format
> (three lines per trap). This causes our parser to fail
> - snmptrapd logs the error "[...]/snmptrapd.conf: line 0: Error: maximum
> conf file count (4096) exceeded" at the start of the log file.
>
> The flow of traps is such that the log file is usually rotated every 30
> minutes, but it goes up and down a bit. Could this failure be happening
> after 4096 SIGHUPs? That would explain the varying time between failures.
>
> I'm grateful for any input.
>
> Regards,
> Joel Hansell
>
>
------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk
_______________________________________________
Net-snmp-users mailing list
Net-snmp-users@lists.sourceforge.net
Please see the following page to unsubscribe or change other options:
https://lists.sourceforge.net/lists/listinfo/net-snmp-users

Reply via email to