Hello.
        My snmpd agent have started crashing on me;  I spent some
time figuring out how and why.  I think I know that now - but not how
to solve it.

        My problem may be related to the bug 1154117 in the bug database.
The behaviour is not identical, however.

        Version is Debian package 5.2.1.2-2, on Intel processor.

        The crash happens shortly after a subagent has sent a notification
that must be forwarded as an inform.  The crash only happens when the
receiver is not responding, so the inform times out; the snmpd reports the
timeout to syslog, and shortly after the snmpd crashes.

        The crash is a stack overrun; gdb shows that 14 functions are
calling each other recursively - when the crash happens, the stack is more
than 300 function calls deep.  This is the bottom of the stack:

#311 0xb7dd938c in snmp_read () at snmp_api.c:5260
#312 0xb7daf908 in snmp_synch_response_cb () at snmp_client.c:813
#313 0xb7daf9f7 in snmp_synch_response () snmp_client.c:851
#314 0xb7e73565 in send_trap_to_sess () at agent_trap.c:829
#315 0xb7ecea09 in send_notifications () at snmpNotifyTable.c:127
#316 0xb7df63a3 in snmp_call_callbacks () at callback.c:224
#317 0xb7e73081 in netsnmp_send_traps () at agent_trap.c:778
#318 0xb7e73487 in send_enterprise_trap_vars () at agent_trap.c:792
#319 0xb7e73675 in send_trap_vars () at agent_trap.c:849
#320 0xb7e8b26c in agentx_notify () at master_admin.c:440
#321 0xb7e8b73f in handle_master_agentx_packet () master_admin.c:532
#322 0xb7dd8ed8 in _sess_process_packet () at snmp_api.c:? (as callback, more 
than one callback available)
#323 0xb7dd974e in _sess_read () at snmp_api.c:5526
#324 0xb7dda569 in snmp_sess_read () at snmp_api.c:5624
#325 0xb7dd938c in snmp_read () at snmp_api.c:5260
#326 0x0804be90 in receive () at snmpd.c:1149
#327 0x0804b298 in main (argc=7, argv=0xbffffd64) at snmpd.c:993

(I have edited this slightly, to include source file and line numbers for
functions in the library.

The problem seems to be a conflict between two coding styles:  The snmpd
agent has a major "select" loop in the "receive" function, but when an
inform is processed, the function "send_trap_to_sess" uses
"snmp_synch_response_cb" which has its own select loop that somehow
"steals"  the control flow.

It seems to me that the only viable solution is to change the way informs
are processed - the timeout check and retransmission should be done in a
programmed alarm, not by taking control of the entire control flow of the
agent.  But this seems to be a major undertaking - if I should do it
myself, I could use a few pieces of good advice first!

best regards
--
Peder Chr. Nørgaard             Senior System Developer, M. Sc.
Ericsson Denmark A/S, Telebit Division
Skanderborgvej 232              tel: +45 30 91 84 31
DK-8260 Viby J, Denmark         fax: +45 89 38 51 01
        e-mail: [EMAIL PROTECTED]
(old e-mail 2000-2003: [EMAIL PROTECTED])
         (old e-mail 1992-2000: [EMAIL PROTECTED])


-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server.
Download it for free - -and be entered to win a 42" plasma tv or your very
own Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
Net-snmp-coders mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/net-snmp-coders

Reply via email to