hello all

A few weeks ago I noticed that I had a problem with snmpd core dumping.

My environment is as follows:

net-snmp-5.4.2.1
build options 
solaris 10 SunOS xxxxxx 5.10 Generic_127111-11 sun4v sparc
SUNW,SPARC-Enterprise-T5220

when I examine the core, 

(gdb) bt
#0  netsnmp_cpu_arch_load (cache=<value optimized out>, magic=<value
optimized out>) at hardware/cpu/cpu_kstat.c:113
#1  0x000a3684 in _cpu_update_stats (reg=8212, magic=0x2010) at
hardware/cpu/cpu.c:207
#2  0x000f9e50 in run_alarms () at snmp_alarm.c:252
#3  0x0002e8a8 in main (argc=<value optimized out>, argv=<value
optimized out>) at snmpd.c:1229

which is caused by an attempt to dereference cpu2 which is NULL

(gdb) print cpu2
$1 = (netsnmp_cpu_info *) 0x0

Checking back through the code a few lines, cpu2 via the call to
netsnmp_cpu_get_byIdx(i, 0)

(gdb) print i
$2 = 63



tracing down through the _cpu_head
(gdb) print *_cpu_head
$4 = {idx = -1, name = "Overall CPU statistics", '\0' <repeats 4073
times>, descr = '\0' <repeats 4095 times>, status = 0, 
  user_ticks = 99983376, nice_ticks = 0, sys_ticks = 0, idle_ticks =
932293758, wait_ticks = 0, kern_ticks = 1961721, 
  intrpt_ticks = 0, sirq_ticks = 0, total_ticks = 1034238355, sys2_ticks
= 1961721, pageIn = 0, pageOut = 0, swapIn = 0, 
  swapOut = 0, nInterrupts = 144158917, nCtxSwitches = 191974805,
history = 0x34eea8, next = 0x2752e8}
(gdb) print _cpu_head->next
$8 = (netsnmp_cpu_info *) 0x2752e8
(gdb) print * _cpu_head->next
$9 = {idx = 27, name = "cpu27", '\0' <repeats 4090 times>, 
  descr = "CPU 27 Sun 1167 MHz sparcv9 with sparcv9 FPU on-line", '\0'
<repeats 4043 times>, status = 2, 
  user_ticks = 28659884, nice_ticks = 0, sys_ticks = 0, idle_ticks =
1003433355, wait_ticks = 0, kern_ticks = 1172789, 
  intrpt_ticks = 0, sirq_ticks = 0, total_ticks = 1033266028, sys2_ticks
= 1172789, pageIn = 0, pageOut = 0, swapIn = 0, 
  swapOut = 0, nInterrupts = 0, nCtxSwitches = 0, history = 0x353870,
next = 0x277638}
(gdb) print * _cpu_head->next->next
$10 = {idx = 50, name = "cpu50", '\0' <repeats 4090 times>, 
  descr = "CPU 50 Sun 1167 MHz sparcv9 with sparcv9 FPU on-line", '\0'
<repeats 4043 times>, status = 2, 
  user_ticks = 99983376, nice_ticks = 0, sys_ticks = 0, idle_ticks =
932293758, wait_ticks = 0, kern_ticks = 1961721, 
  intrpt_ticks = 0, sirq_ticks = 0, total_ticks = 1034238355, sys2_ticks
= 1961721, pageIn = 0, pageOut = 0, swapIn = 0, 
  swapOut = 0, nInterrupts = 0, nCtxSwitches = 0, history = 0x353a88,
next = 0x0}

on the host itself,

# /usr/sbin/psrinfo
50      on-line   since 09/28/2008 15:46:33
63      on-line   since 09/28/2008 15:46:33

Now what I think has happened, is that the cpu... structures have been
instantiated with a view of what CPUs have been allocated to the zone
which has subsequently been changed by the solaris dynamic
reconfiguration.  This then means that when the new processor ID is
introduced to the zone, its not found and a null pointer is returned.

Does this make sense?  
Has anyone else seen this behaviour?
Any ideas how to fix?




---
Nick Hindley
Senior Unix Systems Analyst
Hammersmith And Fulham Bridge Partnership
nick.hind...@hfbp.co.uk
T: (020) 8753 2926



=======================================================================
    This email has been scanned for Virus infection by MessageLabs
     For more information please contact messagel...@atomwide.com 
=======================================================================

------------------------------------------------------------------------------
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
_______________________________________________
Net-snmp-users mailing list
Net-snmp-users@lists.sourceforge.net
Please see the following page to unsubscribe or change other options:
https://lists.sourceforge.net/lists/listinfo/net-snmp-users

Reply via email to