Hi,

I'd like to have an expert opinion about a very sporadic problem we are
seeing on our test systems (RedHat Linux; NetSNMP version 5.5). 'Very
sporadic' meaning we've seen it two times in the last three months.
When this problem occurs, our subagent gets stuck in the NetSNMP library
and uses up to 30 percent CPU time. After a process restart, the problem
is gone.

pstack shows the thread stuck here:
...
Thread 1 (Thread 0xf77ab8f0 (LWP 27568)):
#0  0x00552430 in __kernel_vsyscall ()
#1  0x003524d1 in select () from /lib/libc.so.6
#2  0x05108537 in snmp_synch_response_cb () from
/usr/lib/libnetsnmp.so.20
#3  0x0051aa36 in agentx_synch_response () from
/usr/lib/libnetsnmpagent.so.20
#4  0x0051aab0 in agentx_send_ping () from
/usr/lib/libnetsnmpagent.so.20
#5  0x00506e2e in agentx_check_session () from
/usr/lib/libnetsnmpagent.so.20
#6  0x0515bcfd in run_alarms () from /usr/lib/libnetsnmp.so.20
...

strace shows the following over and over again:
...
select(63, [58 60 62], NULL, NULL, {0, 0}) = 0 (Timeout)
gettimeofday({1314108555, 885224}, NULL) = 0
gettimeofday({1314108555, 885263}, NULL) = 0
gettimeofday({1314108555, 885300}, NULL) = 0
gettimeofday({1314108555, 885344}, NULL) = 0
gettimeofday({1314108555, 885380}, NULL) = 0
gettimeofday({1314108555, 885417}, NULL) = 0
select(63, [58 60 62], NULL, NULL, {0, 1}) = 0 (Timeout)
gettimeofday({1314108555, 885620}, NULL) = 0
gettimeofday({1314108555, 885682}, NULL) = 0
gettimeofday({1314108555, 885719}, NULL) = 0
gettimeofday({1314108555, 885765}, NULL) = 0
gettimeofday({1314108555, 885801}, NULL) = 0
gettimeofday({1314108555, 885837}, NULL) = 0
...

Now, this patch description:
http://sourceforge.net/tracker/index.php?func=detail&aid=3042770&group_i
d=12694&atid=312694
looks, for me at least, very much like what we are seeing. I've verified
with gdb that the process really is running in an endless loop in
snmp_synch_response_cb (it never reaches the start of
snmp_synch_response_cb, but it still reaches snmp_select_info), but my
rudimentary NetSNMP knowledge wasn't up to checking in more detail.

My question now is, can somebody here give an educated guess whether we
*are* seeing the problem solved by that patch? If it ever occurs again,
what more could we check to verify whether it's that problem?
And I'd appreciate a hint at what provokes the occurrence of that
problem - that wasn't clear to me from the patch description. Is it just
bad luck in timing?

Thanks and Regards,
Martina


------------------------------------------------------------------------------
Special Offer -- Download ArcSight Logger for FREE!
Finally, a world-class log management solution at an even better 
price-free! And you'll get a free "Love Thy Logs" t-shirt when you
download Logger. Secure your free ArcSight Logger TODAY!
http://p.sf.net/sfu/arcsisghtdev2dev
_______________________________________________
Net-snmp-coders mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/net-snmp-coders

Reply via email to