Hello.
My snmpd agent have started crashing on me; I spent some
time figuring out how and why. I think I know that now - but not how
to solve it.
My problem may be related to the bug 1154117 in the bug database.
The behaviour is not identical, however.
Version is Debian package 5.2.1.2-2, on Intel processor.
The crash happens shortly after a subagent has sent a notification
that must be forwarded as an inform. The crash only happens when the
receiver is not responding, so the inform times out; the snmpd reports the
timeout to syslog, and shortly after the snmpd crashes.
The crash is a stack overrun; gdb shows that 14 functions are
calling each other recursively - when the crash happens, the stack is more
than 300 function calls deep. This is the bottom of the stack:
#311 0xb7dd938c in snmp_read () at snmp_api.c:5260
#312 0xb7daf908 in snmp_synch_response_cb () at snmp_client.c:813
#313 0xb7daf9f7 in snmp_synch_response () snmp_client.c:851
#314 0xb7e73565 in send_trap_to_sess () at agent_trap.c:829
#315 0xb7ecea09 in send_notifications () at snmpNotifyTable.c:127
#316 0xb7df63a3 in snmp_call_callbacks () at callback.c:224
#317 0xb7e73081 in netsnmp_send_traps () at agent_trap.c:778
#318 0xb7e73487 in send_enterprise_trap_vars () at agent_trap.c:792
#319 0xb7e73675 in send_trap_vars () at agent_trap.c:849
#320 0xb7e8b26c in agentx_notify () at master_admin.c:440
#321 0xb7e8b73f in handle_master_agentx_packet () master_admin.c:532
#322 0xb7dd8ed8 in _sess_process_packet () at snmp_api.c:? (as callback, more
than one callback available)
#323 0xb7dd974e in _sess_read () at snmp_api.c:5526
#324 0xb7dda569 in snmp_sess_read () at snmp_api.c:5624
#325 0xb7dd938c in snmp_read () at snmp_api.c:5260
#326 0x0804be90 in receive () at snmpd.c:1149
#327 0x0804b298 in main (argc=7, argv=0xbffffd64) at snmpd.c:993
(I have edited this slightly, to include source file and line numbers for
functions in the library.
The problem seems to be a conflict between two coding styles: The snmpd
agent has a major "select" loop in the "receive" function, but when an
inform is processed, the function "send_trap_to_sess" uses
"snmp_synch_response_cb" which has its own select loop that somehow
"steals" the control flow.
It seems to me that the only viable solution is to change the way informs
are processed - the timeout check and retransmission should be done in a
programmed alarm, not by taking control of the entire control flow of the
agent. But this seems to be a major undertaking - if I should do it
myself, I could use a few pieces of good advice first!
best regards
--
Peder Chr. Nørgaard Senior System Developer, M. Sc.
Ericsson Denmark A/S, Telebit Division
Skanderborgvej 232 tel: +45 30 91 84 31
DK-8260 Viby J, Denmark fax: +45 89 38 51 01
e-mail: [EMAIL PROTECTED]
(old e-mail 2000-2003: [EMAIL PROTECTED])
(old e-mail 1992-2000: [EMAIL PROTECTED])
-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server.
Download it for free - -and be entered to win a 42" plasma tv or your very
own Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
Net-snmp-coders mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/net-snmp-coders