For some time now, we are having problems with dying snmpd under some 
(not yet fully understood) circumstances.

What I can see from coredumps, traces and tcpdump output suggests that 
the problem is that
1. an agentx subagent is non-responsive (it first registers its OIDs, 
then tries to contact another service which is not available at that 
time, so any requests coming in will not be processed)
2. a frontend sends GET _and_ GETNEXT requests destined for this 
subagent rather quickly (not always with a 5s time difference between 
retries)

There are several bug reports that suggest the same:
[ 1097029 ] snmpd dies during snmpwalk and (dis)connecting subagent
[ 1491604 ] snmpd crash with getnext
[ 1574285 ] snmpd crash when agentx subagent crash
[ 1565703 ] SNMPD crash in net-snmp-5.2.2
[ 1413728 ] get/getnext with multiple varbind against table
[ 1403948 ] 5.1.3.1 snmpd crashes shortly after startup

When the snmpd tries to close the agentx session, it calls 
unregister_mibs_by_session which calls netsnmp_subtree_free to free 
subtrees in the context list. Later, 
netsnmp_remove_delegated_requests_for_session is called which checks 
request->subtree->session. However, requests->subtree is now a stale 
pointer (electric fence lets snmpd choke upon this comparison).

I added some code to netsnmp_add_varbind_to_cache() to assign a 
netsnmp_subtree_deepcopy(tp) to request->subtree and also use members of 
_that_ instance. This helps around the comparison mentioned above, but I 
suppose this creates a memory leak.
Also, snmpd now dies when netsnmp_handler_mark_requests_as_delegated is 
called, as some netsnmp_request_info has been freed already.

If we do not run snmpd under electric fence, we get abort() calls in 
free(), which means that the dynamic memory management (malloc/free) is 
corrupt. Maybe some struct is freed twice or is freed, a stale pointer 
us continued to be used and already freed memory is written to?

The problem has been seen on 5.1.2 but I have been able to reproduce it 
with net-snmp-cvs-MAIN_20061023_0318.tar.gz

It has been seen on 32 and 64 bit linux systems,
it has been seen on RedHat and Novell/SuSE systems,
it has been seen on 5.1.2, 5.2.2 and 5.3.0

Josef
-- 
Josef Möllers (Pinguinpfleger bei FSC)
        If failure had no penalty success would not be a prize
                                                -- T.  Pratchett

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Net-snmp-coders mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/net-snmp-coders

Reply via email to