Hi All,
I'm currently trying to chase down a nasty bug in Net-SNMP for my current
client, and I've pretty much hit the brick-wall of my own understanding of the
way things are supposed to work, so I'm hoping things may make a little more
sense to those who know the code better than I.
The Scenario:
We've got an application that registers as an AgentX subagent in order to
answer queries for a private MIB related to the applications state.
Platform is Montavista linux on x86 (specifically, glibc 2.3.3, kernel
2.6.10-x86 and 2.6.21-x86-64).
We've been experiencing random crashes in the field for some time now, which
seemed to be load related, and after much tracing and head-scratching, we've
found the culprit to be snmpd. Specifically, the problem appears to be that
under load on our app, the AgentX queries sometimes time-out (application
prioritises it's primary function over SNMP, so sometimes AgentX queries get
queued up a bit), and the situation where snmpd disconnects the session due to
time-out is not handled well. Worse, shutting down our app. Is very likely to
kill snmpd if there are requests outstanding at the point of shutdown (quite
possible if the request load is high).
I've built a test environment that can exercise this bug, so I've been able to
do some investigation:
5.6.1 and 5.7.1 "stock" builds dump core (Segfault) when AgentX connection
times out or disconnects
We've tried the "subagent_free_cache" patch (which is the same as the patch in
1633670) on both 5.6.1 and 5.7.1 and this results in an infinite loop in the
following code in "agent/mibgroup/agentx/master_admin.c", function
"close_agentx_session()":
if (session->subsession != NULL) {
netsnmp_session *subsession = session->subsession;
for(; subsession; subsession = subsession->next) {
while
(netsnmp_remove_delegated_requests_for_session(subsession)) {
DEBUGMSGTL(("agentx/master", "Continue removing
delegated subsession reqests\n"));
It loops forever on the while, with the return value never decreasing. (log
message and spelling mistake repeated ad-infinitum, 100% CPU load for snmpd).
I've also tried the current trunk version, which has the 1633670 patch already
applied, and get the same behaviour.
After lots of additional debugging, the culprit behaviour appears to be that
"netsnmp_remove_delegated_requests_for_session()" removes (or, more correctly,
uses "netsnmp_request_set_error()" on) everything is the agent_delegated_list
that matches the target session, then calls
"netsnmp_check_outstanding_agent_requests()", which walks the agent_delegated
list and de-queues anything that passes "netsnmp_check_for_delegated()".
However, there appear to be requests in the subsession list that don't match,
and thus are still marked as delegated, and thus don't pass check_for_delegated
and..... Repeat until bored......
I've tried making (and using) a more aggressive flavour of
"netsnmp_remove_delegated_requests_for_session()" that doesn't have the:
if(request->subtree->session != sess)
continue;
Test, but that don't fix it! Note that "..check_for_delegated()" checks in
asp->treecache, but "..remove_delegated_requests.." removes the requests from
[agent_delegated_list]->requests, and it appears in our case the two don't
quite meet up...
I've tried writing an even more aggressive version of
"netsnmp_remove_delegated_requests_for_session()" that eats every delegated
request In the treecache, which, to be fair, stops the infinite loop above, but
just causes snmpd to go catatonic elsewhere...
...and that's where my understanding of these inter-related data structures
stops, I'm afraid!
I'm sort of hoping that those that live, eat and breathe this code will have
some suggestions.
Other info that may help:
My test SNMP query set is a set of SNMP GET and GETNEXTs taken from a customer
network capture - they all hit the MIB that is delegated to our AgentX
subagent, however, some of the GETNEXTs walk off the end of our MIB and into
the next enterprise along (which happens to be the NET-SNMP MIB, in our
particular case).
Ken Farnen.
Agilent don't authorise me to order paperclips, much less speak on their
behalf, I'm just a freelance consultant who happens to sit at one of their
desks at the moment, anything I say is my opinion only, and nothing to do with
my Client!
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Net-snmp-coders mailing list
Net-snmp-coders@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/net-snmp-coders