Hi, I have one workstation(hp xw4300) , with Solaris 10 (x86) and one Digi Sync570i card. The system may hangs at any time, from a few minutes to a couple of hours, when the card is receiving data frames.
I doubt the system hanging is caused by the driver module for Sync570, however, the same driver works properly on solaris 8 system. We used to install Solaris 8 on HP xw4100, but now we have to install Solaris 10 on HP xw4300.(we cant get HP xw4100 in the market) I use kmdb to load solaris system. After the system hangs I can't ping the host. And the keyboard and mouse have no reponses. I can get the crashdump file by pressing "F1+A" and then input "$<systemdump". By analysing the crashdump file , I can't find such problems as 'mutex deadlock' and 'bad trap'. I really don't know what to do next step ! # crashdump files can be downloaded from the following URLs : # www.ras.com.cn/rivanwang/crash_4.tar.gz # www.ras.com.cn/rivanwang/crash_8.tar.gz # www.ras.com.cn/rivanwang/crash_7_nor.tar.gz # "crash_7_nor.tar.gz" is generated before system hanging happens. I have some questions as follows. Would you be so kind as to give me some suggestions? [[ Q1 ]] I can't find the kernel thread reponding to Sync570 module by using the command "threadlist -v". But I can get the LOADADDR: ::modinfo !grep Sync 161 feba4340 cc60 1 dsync (Sync/570 Device Driver) How can I find the address of the kernel thread reponding to Sync570 module ? [[ Q2 ]] ::msgbuf panic[cpu0]/thread=d2c84de0: BAD TRAP: type=e (#pf Page fault) rp=d2c84cec addr=0 occurred in module "<unknown>" due to a NULL pointer dereference sched: #pf Page fault Bad kernel fault at addr=0x0 pid=0, pc=0x0, sp=0x202, eflags=0x10002 cr0: 8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 6f8<xmme,fxsr,pge,mce,pae,pse,de> cr2: 0 cr3: 4226000 gs: 1b0 fs: 0 es: 160 ds: 160 edi: d2f50a60 esi: fef4b2a8 ebp: d2c84d34 esp: d2c84d1c ebx: d2f54180 edx: d2f541f8 ecx: 1f eax: fed6c870 trp: e err: 10 eip: 0 cs: 158 efl: 10002 usp: 202 ss: d2c84d3c d2c84c4c unix:die+a7 (e, d2c84cec, 0, 0) d2c84cd8 unix:trap+f56 (d2c84cec, 0, 0) d2c84cec unix:cmntrap+83 () d2c84d34 0 (d2c84d44, fe81189a,) d2c84d3c genunix:kdi_dvec_enter+a (d2c84d50, fe81183c,) d2c84d44 unix:debug_enter+32 (0) d2c84d50 unix:abort_sequence_enter+27 (0) d2c84d64 kbtrans:kbtrans_streams_key+3e (d2f54180, 1f, 0) d2c84d88 kb8042:kb8042_received_byte+b2 (fef4b1a8, 1e) d2c84da0 kb8042:kb8042_intr+65 (fef4b1a8) d2c84db8 i8042:i8042_intr+a4 (d2f50980) ---------------------------------------------------------------------------- ::cpuinfo -v ID ADDR FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD PROC 0 fec20ae4 1b 8 0 104 no no t-740847 d2c84de0 sched | | | RUNNING <--+ | +--> PIL THREAD READY | 5 d2c84de0 EXISTS | 3 d2ca0de0 ENABLE | - d2c28de0 (idle) | +--> PRI THREAD PROC 99 d2c9ade0 sched 99 d2c97de0 sched 60 d3264a00 fsflush 60 d2e1ade0 sched 60 d2e37de0 sched 60 d4644de0 sched 60 d96dcde0 sched 59 d38e7400 Xsun d2c84de0::thread ADDR STATE FLG PFLG SFLG PRI EPRI PIL INTR DISPTIME BOUND PR d2c84de0 onproc 809 0 3 104 0 5 d2ca0de0 0 -1 2 d2ca0de0::thread ADDR STATE FLG PFLG SFLG PRI EPRI PIL INTR DISPTIME BOUND PR d2ca0de0 onproc 9 0 3 102 0 3 d2c28de0 46a51 -1 1 d2ca0de0::findstack -v stack pointer for thread d2ca0de0: d2ca0c2c d2ca0de0 0xd94c62bc() ---------------------------------------------------------------------------- After I pressed "F1+A"?the kernel created the thread "d2c84de0" to give responses to keyboard interruption(PIL = 5, PRI= 104). but another thread "d2ca0de0",at same time, is still running on CPU. ( PIL = 3 , PRI = 102 ). I guess one event may causes the kernel to create the thread d2ca0de0 , but then the kernel hangs, until I have pressed "F1+A" , the kernel creates another thead d2c84de0 , and finally crashed down. I have no idea what causes the kernel to create thread d2ca0de0 (PRI=102,PIL=3)? [[ Q3 ]] ::cpuinfo ID ADDR FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD PROC 0 fec20ae4 1b 8 0 104 no no t-740847 d2c84de0 sched ::cycinfo -v CPU CYC_CPU STATE NELEMS ROOT FIRE HANDLER 0 d9aabe00 online 4 d9aabd80 96b6b848e80 clock 2 | +------------------+------------------+ 0 1 | | +---------+--------+ +---------+---------+ 3 | +----+----+ ADDR NDX HEAP LEVL PEND FIRE USECINT HANDLER d9aabd80 0 1 high 0 96b6b848e80 10000 cbe_hres_tick d9aabda0 1 2 low 741253 96b6b848e80 10000 apic_redistribute_compute d9aabdc0 2 0 lock 406 96b6b848e80 10000 clock d9aabde0 3 3 high 0 96b6d4e5200 1000000 deadman ----------------------------------------------------------------------------------- The value of SWITCH of thread d2c84de0 is 740847 ; The value of PEND of apic_redistribute_compute is 741253 ; The value of PEND of clock is 406 . (741253 - 406) == 740847 What does it mean ? Could you please account for it ? [[ Q4 ]] ::ipcs Message queues: failed to read 'msq_svc'; module not present Shared memory: ADDR REF ID KEY MODE PRJID ZONEID OWNER GROUP CREAT CGRP d4915f50 1 3 103 0666 3 0 1002 102 1002 102 d3f0b090 1 2 101 0666 3 0 0 0 0 0 d3f0b2c0 1 1 102 0666 3 0 1002 102 1002 102 d3f0bbf0 1 0 100 0666 3 0 1002 102 1002 102 Semaphores: ADDR REF ID KEY MODE PRJID ZONEID OWNER GROUP CREAT CGRP d4915ee0 3 3 103 0666 3 0 1002 102 1002 102 d3f0b1e0 3 2 101 0666 3 0 0 0 0 0 d3f0b250 4 1 102 0666 3 0 1002 102 1002 102 d3f0bb80 7 0 100 0666 3 0 1002 102 1002 102 > ------------------------------------------- I dont know what threads are accessing to the semaphore "d3f0b1e0" ? How can I find these unkown threads? ::showrev Hostname: cetc.a28.com Release: 5.10 Kernel architecture: i86pc Application architecture: i386 Kernel version: SunOS 5.10 i86pc Generic Platform: i86pc ::msgbuf MESSAGE /pci at 0,0/pci103c,3013 at 1d,2 (uhci2): failed to attach pcplusmp: pciclass,0c0300 (uhci) instance 3 vector 0x16 ioapic 0x1 intin 0x16 is bound to cpu 0 /pci at 0,0/pci103c,3013 at 1d,3 (uhci3): failed to attach cpu0: x86 (GenuineIntel family 15 model 4 step 10 clock 3000 MHz) cpu0: Intel(r) Pentium(r) 4 CPU 3.00GHz NOTICE: Broadcom NetXtreme Gigabit Ethernet Driver (32-bit) v8.3.1 PCI-device: pci8086,27e2 at 1c,5, pci_pci3 pci_pci3 is /pci at 0,0/pci8086,27e2 at 1c,5 pcplusmp: pci14e4,1600 (bcme) instance 0 vector 0x11 ioapic 0x1 intin 0x11 is bo und to cpu 0 NOTICE: bcme0 : Broadcom NetXtreme Gigabit Ethernet BCM95752 (Copper) is detected NOTICE: bcme0 : Firmware version 5752-v3.10 NOTICE: bcme0 : No Link pcplusmp: pci14e4,1600 (bcme) instance 0 vector 0x11 ioapic 0x1 intin 0x11 is bo und to cpu 0 PCI-device: pci103c,3013 at 0, bcme0 bcme0 is /pci at 0,0/pci8086,27e2 at 1c,5/pci103c,3013 at 0 pcplusmp: pciclass,0c0300 (uhci) instance 0 vector 0x14 ioapic 0x1 intin 0x14 is bound to cpu 0 /pci at 0,0/pci103c,3013 at 1d (uhci0): failed to attach pcplusmp: pciclass,0c0300 (uhci) instance 1 vector 0x12 ioapic 0x1 intin 0x12 is bound to cpu 0 /pci at 0,0/pci103c,3013 at 1d,1 (uhci1): failed to attach pcplusmp: pciclass,0c0300 (uhci) instance 2 vector 0x15 ioapic 0x1 intin 0x15 is bound to cpu 0 /pci at 0,0/pci103c,3013 at 1d,2 (uhci2): failed to attach pcplusmp: pciclass,0c0300 (uhci) instance 3 vector 0x16 ioapic 0x1 intin 0x16 is bound to cpu 0 /pci at 0,0/pci103c,3013 at 1d,3 (uhci3): failed to attach UltraDMA mode 5 selected dump on /dev/dsk/c1d0s1 size 2047 MB NOTICE: bcme0 : Link is Up (100Mbps, Full Duplex, Rx & Tx Flow Control ON) pseudo-device: pm0 pm0 is /pseudo/pm at 0 pseudo-device: devinfo0 devinfo0 is /pseudo/devinfo at 0 xsvc0 at root xsvc0 is /xsvc pcplusmp: asy (asy) instance 0 vector 0x4 ioapic 0x1 intin 0x4 is bound to cpu 0 pcplusmp: asy (asy) instance 0 vector 0x4 ioapic 0x1 intin 0x4 is bound to cpu 0 ISA-device: asy0 asy0 is /isa/asy at 1,3f8 pseudo-device: pool0 pool0 is /pseudo/pool at 0 pseudo-device: vol0 vol0 is /pseudo/vol at 0 pcplusmp: ide (ata) instance 0 vector 0xe ioapic 0x1 intin 0xe is bound to cpu 0 pcplusmp: ide (ata) instance 0 vector 0xe ioapic 0x1 intin 0xe is bound to cpu 0 ATAPI device at targ 0, lun 0 lastlun 0x0 model LITE-ON DVD SOHD-16P9S ATA/ATAPI-6 supported, majver 0x78 minver 0x0 PCI-device: ide at 0, ata0 ata0 is /pci at 0,0/pci-ide at 1f,1/ide at 0 ATA DMA off: disabled. Control with "atapi-cd-dma-enabled" property PIO mode 4 selected ATA DMA off: disabled. Control with "atapi-cd-dma-enabled" property PIO mode 4 selected sd0 at ata0: target 0 lun 0 sd0 is /pci at 0,0/pci-ide at 1f,1/ide at 0/sd at 0,0 device pciclass,030000 at 0(display#1) keeps up device sd at 0,0(sd#0), but the latter is not power managed pcplusmp: pciclass,0c0300 (uhci) instance 3 vector 0x16 ioapic 0x1 intin 0x16 is bound to cpu 0 /pci at 0,0/pci103c,3013 at 1d,3 (uhci3): failed to attach pcplusmp: pciclass,0c0300 (uhci) instance 3 vector 0x16 ioapic 0x1 intin 0x16 is bound to cpu 0 /pci at 0,0/pci103c,3013 at 1d,3 (uhci3): failed to attach pcplusmp: fdc (fdc) instance 0 vector 0x6 ioapic 0x1 intin 0x6 is bound to cpu 0 pcplusmp: fdc (fdc) instance 0 vector 0x6 ioapic 0x1 intin 0x6 is bound to cpu 0 ISA-device: fdc0 fd0 at fdc0 fd0 is /isa/fdc at 1,3f0/fd at 0,0 8042 device: mouse at 1, mouse8042 # 0 mouse80420 is /isa/i8042 at 1,60/mouse at 1 pseudo-device: pm0 pm0 is /pseudo/pm at 0 Pad8 attaching at 14:47:16, Jun 7 2001 pad81 at root: space 0 offset ee704 pad81 is /pad8 at 0,ee704 Solaris x86 pad driver open. Solaris x86 pad driver open. pcplusmp: pci114f,5013 (dsync) instance 0 vector 0x12 ioapic 0x1 intin 0x12 is b ound to cpu 0 pcplusmp: pci114f,5013 (dsync) instance 0 vector 0x12 ioapic 0x1 intin 0x12 is b ound to cpu 0 WARNING: minor name <dsync1> is not compatible network driver instance <0> WARNING: minor name <dsync2> is not compatible network driver instance <0> WARNING: minor name <dsync3> is not compatible network driver instance <0> NOTICE: bcme0 : No Link NOTICE: bcme0 : Link is Up (10Mbps, Full Duplex, Rx & Tx Flow Control ON) NOTICE: bcme0 : No Link NOTICE: bcme0 : Link is Up (100Mbps, Full Duplex, Rx & Tx Flow Control ON) NOTICE: bcme0 : No Link NOTICE: bcme0 : Link is Up (100Mbps, Full Duplex, Rx & Tx Flow Control ON) panic[cpu0]/thread=d2c84de0: BAD TRAP: type=e (#pf Page fault) rp=d2c84cec addr=0 occurred in module "<unknow n>" due to a NULL pointer dereference sched: #pf Page fault Bad kernel fault at addr=0x0 pid=0, pc=0x0, sp=0x202, eflags=0x10002 cr0: 8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 6f8<xmme,fxsr,pge,mce,pae,pse,de> cr2: 0 cr3: 4226000 gs: 1b0 fs: 0 es: 160 ds: 160 edi: d2f50a60 esi: fef4b2a8 ebp: d2c84d34 esp: d2c84d1c ebx: d2f54180 edx: d2f541f8 ecx: 1f eax: fed6c870 trp: e err: 10 eip: 0 cs: 158 efl: 10002 usp: 202 ss: d2c84d3c d2c84c4c unix:die+a7 (e, d2c84cec, 0, 0) d2c84cd8 unix:trap+f56 (d2c84cec, 0, 0) d2c84cec unix:cmntrap+83 () d2c84d34 0 (d2c84d44, fe81189a,) d2c84d3c genunix:kdi_dvec_enter+a (d2c84d50, fe81183c,) d2c84d44 unix:debug_enter+32 (0) d2c84d50 unix:abort_sequence_enter+27 (0) d2c84d64 kbtrans:kbtrans_streams_key+3e (d2f54180, 1f, 0) d2c84d88 kb8042:kb8042_received_byte+b2 (fef4b1a8, 1e) d2c84da0 kb8042:kb8042_intr+65 (fef4b1a8) d2c84db8 i8042:i8042_intr+a4 (d2f50980) syncing file systems... 2 2 done dumping to /dev/dsk/c1d0s1, offset 429391872, content: kernel ????????rivanwang ????????rivan at vip.sina.com ??????????2007-03-28 This message posted from opensolaris.org