On Thu, Sep 17, 2009 at 10:16 AM, Garrett D'Amore <[email protected]> wrote: > Look closely at the stack. You'll notice that a PIL9 interrupt > *interrupted* e1000g while it was servicing an interrupt. I don't think > e1000g is at fault here. Something else is doing it.
This is probably my lack of knowledge about how solaris handles interrupts, but with doing a little digging: > 0xffffff0007c49c60::findstack -v stack pointer for thread ffffff0007c49c60: ffffff0007c49b30 ffffff0007c49bb0 rm_isr+0xaa() ffffff0007c49c00 av_dispatch_autovect+0x7c(10) ffffff0007c49c40 dispatch_hardint+0x33(10, 6) ffffff0007c4f450 switch_sp_and_call+0x13() ffffff0007c4f4a0 do_interrupt+0x9e(ffffff0007c4f4b0, b) ffffff0007c4f4b0 _interrupt+0xba() I'm assuming this portion of the stack dump is what you're talking about... looking at the function signature for dispatch_hardint -- the new vector is 10, and the old ipl is 6. > ::interrupts -d IRQ Vect IPL Bus Trg Type CPU Share APIC/INT# Driver Name(s) 3 0xb1 12 ISA Edg Fixed 0 1 0x0/0x3 asy#1 4 0xb0 12 ISA Edg Fixed 0 1 0x0/0x4 asy#0 6 0x41 5 ISA Edg Fixed 0 1 0x0/0x6 fdc#0 7 0x42 5 ISA Edg Fixed 1 1 0x0/0x7 ecpp#0 9 0x81 9 PCI Lvl Fixed 1 1 0x0/0x9 acpi_wrapper_isr 15 0x43 5 ISA Edg Fixed 0 1 0x0/0xf ata#1 16 0x83 9 PCI Lvl Fixed 1 4 0x0/0x10 hci1394#0, uhci#3, uhci#0, nvidia#0 17 0x87 8 PCI Lvl Fixed 0 1 0x0/0x11 audio810#0 18 0x86 9 PCI Lvl Fixed 1 1 0x0/0x12 pci-ide#1 19 0x85 9 PCI Lvl Fixed 0 1 0x0/0x13 uhci#1 23 0x84 9 PCI Lvl Fixed 1 1 0x0/0x17 ehci#0 26 0x40 5 PCI Lvl Fixed 1 1 0x1/0x2 aac#0 48 0x60 6 PCI Lvl Fixed 1 1 0x2/0x0 e1000g#0 72 0x82 7 PCI Edg MSI 0 1 - pcie_pci#0 73 0x30 4 PCI Edg MSI 0 1 - pcie_pci#2 74 0x44 5 PCI Edg MSI 0 1 - adpu320#0 160 0xa0 0 Edg IPI all 0 - poke_cpu 192 0xc0 13 Edg IPI all 1 - xc_serv 208 0xd0 14 Edg IPI all 1 - kcpc_hw_overflow_intr 209 0xd1 14 Edg IPI all 1 - cbe_fire 210 0xd3 14 Edg IPI all 1 - cbe_fire 240 0xe0 15 Edg IPI all 1 - xc_serv 241 0xe1 15 Edg IPI all 1 - apic_error_intr That makes sense -- e1000g#0 is IPL 6, however shouldn't there then be an entry somewhere in there with a VECT value of 0x0a and an IPL of 9? Or do i still have more learning to do? > > - Garrett > > Jason King wrote: >> >> I have a desktop that keeps freezing.. after some work, I managed to >> force a crashdump via kmdb.. It's a dual-core xeon desktop running OS >> 2009.06 -- in this case I'm running virtualbox on it with a bridged >> ethernet connection. >> >> My very rudimentary analysis is this: >> >> zsh 3 % scat 0 >> >> Solaris[TM] CAT 5.2 for Solaris 11 64-bit x64 >> SV4990M, Aug 26 2009 >> >> Copyright © 2009 Sun Microsystems, Inc. All rights reserved. >> Use is subject to license terms. >> >> Feedback regarding the tool should be sent to [email protected] >> Visit the Solaris CAT blog at http://blogs.sun.com/SolarisCAT >> >> opening unix.0 vmcore.0 ...dumphdr...symtab...core...done >> loading core data: modules...symbols...CTF...done >> >> core file: /var/crash/homer/vmcore.0 >> user: Jason King (jking:101) >> release: 5.11 (64-bit) >> version: snv_111b >> machine: i86pc >> node name: homer >> system type: i86pc >> hostid: 4eda84 >> dump_conflags: 0x10000 (DUMP_KERNEL) on /dev/zvol/dsk/rpool/dump(1.96G) >> snooping: 0x1 >> boothowto: 0x22040 (DEBUG|VERBOSE|KMDB) >> time of crash: Thu Sep 17 09:39:15 CDT 2009 >> age of system: 15 hours 32 minutes 10.05 seconds >> panic CPU: 0 (2 CPUs, 3.93G memory) >> panic string: BAD TRAP: type=e (#pf Page fault) rp=ffffff00078d1da0 >> addr=0 occurred in module "<unknown>" due to a NULL pointer >> dereference >> >> sanity checks: settings...vmem... >> WARNING: CPU0 has cpu_intr_actv for 2 >> WARNING: CPU1 has cpu_intr_actv for 6 9 >> WARNING: last_swtch[1]: 0x553c75 (1 minutes 9.68 seconds earlier) >> WARNING: PIL9 interrupt thread 0xffffff0007c49c60 on CPU1 pinning PIL6 >> interrupt thread 0xffffff0007c4fc60 pinning IA thread >> 0xffffff01d6d55740 >> sysent...clock...misc... >> WARNING: 54 expired realtime (max -1m41.272660310s) and 27 expired >> normal (max -11.562660310s) callouts >> done >> >> Does this mean that interrupt thread 0xffffff0007c4fc60 is taking too >> long? It would explain why the box seems to hang. >> That thread is: >> >> % mdb -k unix.0 vmcore.0 >> Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc >> pcplusmp scsi_vhci zfs sd sockfs ip hook neti sctp arp usba uhci s1394 >> fctl md lofs audiosup fcip fcp cpc random crypto logindmux ptm ufs >> nsmb sppp ipc ] >> >>> >>> 0xffffff0007c49c60::findstack >>> >> >> stack pointer for thread ffffff0007c49c60: ffffff0007c49b30 >> ffffff0007c49bb0 rm_isr+0xaa() >> ffffff0007c49c00 av_dispatch_autovect+0x7c() >> ffffff0007c49c40 dispatch_hardint+0x33() >> ffffff0007c4f450 switch_sp_and_call+0x13() >> ffffff0007c4f4a0 do_interrupt+0x9e() >> ffffff0007c4f4b0 _interrupt+0xba() >> ffffff0007c4f5c0 default_lock_delay+0x8c() >> ffffff0007c4f630 lock_set_spl_spin+0xc2() >> ffffff0007c4f690 mutex_vector_enter+0x45e() >> ffffff0007c4f6c0 RTSemEventSignal+0x6a() >> ffffff0007c4f740 0xfffffffff836c57b() >> ffffff0007c4f770 0xfffffffff836d73a() >> ffffff0007c4f830 vboxNetFltSolarisRecv+0x331() >> ffffff0007c4f880 VBoxNetFltSolarisModReadPut+0x107() >> ffffff0007c4f8f0 putnext+0x21e() >> ffffff0007c4f950 dld_str_rx_raw+0xb3() >> ffffff0007c4fa10 dls_rx_promisc+0x179() >> ffffff0007c4fa50 mac_promisc_dispatch_one+0x5f() >> ffffff0007c4fac0 mac_promisc_dispatch+0x105() >> ffffff0007c4fb10 mac_rx+0x3e() >> ffffff0007c4fb50 mac_rx_ring+0x4c() >> ffffff0007c4fbb0 e1000g_intr+0x17e() >> >> Do I appear to be on the right track, and can anyone offer any >> additional suggestions where to go from here (or even recognize the >> problem)? >> _______________________________________________ >> driver-discuss mailing list >> [email protected] >> http://mail.opensolaris.org/mailman/listinfo/driver-discuss >> > > _______________________________________________ driver-discuss mailing list [email protected] http://mail.opensolaris.org/mailman/listinfo/driver-discuss
