For some time now, we are having problems with dying snmpd under some
(not yet fully understood) circumstances.
What I can see from coredumps, traces and tcpdump output suggests that
the problem is that
1. an agentx subagent is non-responsive (it first registers its OIDs,
then tries to contact another service which is not available at that
time, so any requests coming in will not be processed)
2. a frontend sends GET _and_ GETNEXT requests destined for this
subagent rather quickly (not always with a 5s time difference between
retries)
There are several bug reports that suggest the same:
[ 1097029 ] snmpd dies during snmpwalk and (dis)connecting subagent
[ 1491604 ] snmpd crash with getnext
[ 1574285 ] snmpd crash when agentx subagent crash
[ 1565703 ] SNMPD crash in net-snmp-5.2.2
[ 1413728 ] get/getnext with multiple varbind against table
[ 1403948 ] 5.1.3.1 snmpd crashes shortly after startup
When the snmpd tries to close the agentx session, it calls
unregister_mibs_by_session which calls netsnmp_subtree_free to free
subtrees in the context list. Later,
netsnmp_remove_delegated_requests_for_session is called which checks
request->subtree->session. However, requests->subtree is now a stale
pointer (electric fence lets snmpd choke upon this comparison).
I added some code to netsnmp_add_varbind_to_cache() to assign a
netsnmp_subtree_deepcopy(tp) to request->subtree and also use members of
_that_ instance. This helps around the comparison mentioned above, but I
suppose this creates a memory leak.
Also, snmpd now dies when netsnmp_handler_mark_requests_as_delegated is
called, as some netsnmp_request_info has been freed already.
If we do not run snmpd under electric fence, we get abort() calls in
free(), which means that the dynamic memory management (malloc/free) is
corrupt. Maybe some struct is freed twice or is freed, a stale pointer
us continued to be used and already freed memory is written to?
The problem has been seen on 5.1.2 but I have been able to reproduce it
with net-snmp-cvs-MAIN_20061023_0318.tar.gz
It has been seen on 32 and 64 bit linux systems,
it has been seen on RedHat and Novell/SuSE systems,
it has been seen on 5.1.2, 5.2.2 and 5.3.0
Josef
--
Josef Möllers (Pinguinpfleger bei FSC)
If failure had no penalty success would not be a prize
-- T. Pratchett
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Net-snmp-coders mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/net-snmp-coders