Re: Sporadic 9.0-RC2 boot-time panic
On Wed, 7 Dec 2011, Mike Andrews wrote: On 12/5/11 9:39 PM, Mike Andrews wrote: On 12/1/2011 6:03 PM, Mike Andrews wrote: On 11/28/11 5:48 PM, Ronald Klop wrote: On Mon, 28 Nov 2011 23:37:27 +0100, Mike Andrews mandr...@bit0.com wrote: *Sometimes* when booting 9.0-RC2 on *some* of my machines, I'll get one of the following two panics during multiuser startup, usually while running the /usr/local/etc/rc.d scripts. (The instruction pointer is always exactly one of these two, and they look fairly related.) If after two or three reboots it manages to not panic, the system will run perfectly stable. FYI this is still happening on 9.0-RC3 -- r228247 to be precise. It only seems to be happening on one particular model of motherboard (Supermicro X8STi-F) but it is happening on several identical machines with them -- running on several other (mostly Supermicro) boards is just fine, including at least one with the exact same 82574L NICs. Whoever's wanting to work on this, contact me off-list to get some more up to date console logs and the kernel config. ...or just look at the newly opened kern/163117 PR. ... *crickets* ...or wait and hope r228386 and r228387 fix it, though I doubt they'll get merged before release. I will see about finding test hardware that I can run -current on. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Sporadic 9.0-RC2 boot-time panic
On 12/5/11 9:39 PM, Mike Andrews wrote: On 12/1/2011 6:03 PM, Mike Andrews wrote: On 11/28/11 5:48 PM, Ronald Klop wrote: On Mon, 28 Nov 2011 23:37:27 +0100, Mike Andrews mandr...@bit0.com wrote: *Sometimes* when booting 9.0-RC2 on *some* of my machines, I'll get one of the following two panics during multiuser startup, usually while running the /usr/local/etc/rc.d scripts. (The instruction pointer is always exactly one of these two, and they look fairly related.) If after two or three reboots it manages to not panic, the system will run perfectly stable. FYI this is still happening on 9.0-RC3 -- r228247 to be precise. It only seems to be happening on one particular model of motherboard (Supermicro X8STi-F) but it is happening on several identical machines with them -- running on several other (mostly Supermicro) boards is just fine, including at least one with the exact same 82574L NICs. Whoever's wanting to work on this, contact me off-list to get some more up to date console logs and the kernel config. ...or just look at the newly opened kern/163117 PR. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Sporadic 9.0-RC2 boot-time panic
On 12/1/2011 6:03 PM, Mike Andrews wrote: On 11/28/11 5:48 PM, Ronald Klop wrote: On Mon, 28 Nov 2011 23:37:27 +0100, Mike Andrews mandr...@bit0.com wrote: *Sometimes* when booting 9.0-RC2 on *some* of my machines, I'll get one of the following two panics during multiuser startup, usually while running the /usr/local/etc/rc.d scripts. (The instruction pointer is always exactly one of these two, and they look fairly related.) If after two or three reboots it manages to not panic, the system will run perfectly stable. FYI this is still happening on 9.0-RC3 -- r228247 to be precise. It only seems to be happening on one particular model of motherboard (Supermicro X8STi-F) but it is happening on several identical machines with them -- running on several other (mostly Supermicro) boards is just fine, including at least one with the exact same 82574L NICs. Whoever's wanting to work on this, contact me off-list to get some more up to date console logs and the kernel config. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Sporadic 9.0-RC2 boot-time panic
On 11/28/11 5:48 PM, Ronald Klop wrote: On Mon, 28 Nov 2011 23:37:27 +0100, Mike Andrews mandr...@bit0.com wrote: *Sometimes* when booting 9.0-RC2 on *some* of my machines, I'll get one of the following two panics during multiuser startup, usually while running the /usr/local/etc/rc.d scripts. (The instruction pointer is always exactly one of these two, and they look fairly related.) If after two or three reboots it manages to not panic, the system will run perfectly stable. For some probably-unrelated reason, the dump never finishes in either case. First panic (note em0 warning before it): - em0: discard frame w/o packet header Fatal trap 9: general protection fault while in kernel mode cpuid = 0; apic id = 00 instruction pointer = 0x20:0x805e4fc5 stack pointer = 0x28:0xff80003299e0 frame pointer = 0x28:0xff8000329a00 code segment = base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 12 (irq256: em0:rx 0) trap number = 9 panic: general protection fault cpuid = 0 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a kdb_backtrace() at kdb_backtrace+0x37 panic() at panic+0x187 trap_fatal() at trap_fatal+0x290 trap() at trap+0x10a calltrap() at calltrap+0x8 --- trap 0x9, rip = 0x805e4fc5, rsp = 0xff80003299e0, rbp = 0xff8000329a00 --- m_freem() at m_freem+0x25 ether_nh_input() at ether_nh_input+0x82 netisr_dispatch_src() at netisr_dispatch_src+0x20b em_rxeof() at em_rxeof+0x1ca em_msix_rx() at em_msix_rx+0x24 intr_event_execute_handlers() at intr_event_execute_handlers+0x104 ithread_loop() at ithread_loop+0xa4 fork_exit() at fork_exit+0x11f fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xff8000329d00, rbp = 0 --- Uptime: 49s Dumping 679 out of 12263 MB: - Second panic (no em0 discard warning this time): - Fatal trap 9: general protection fault while in kernel mode cpuid = 0; apic id = 00 instruction pointer = 0x20:0x8063c0e4 stack pointer = 0x28:0xff8000329a00 frame pointer = 0x28:0xff8000329a40 code segment = base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 12 (irq256: em0:rx 0) trap number = 9 panic: general protection fault cpuid = 0 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a kdb_backtrace() at kdb_backtrace+0x37 panic() at panic+0x187 trap_fatal() at trap_fatal+0x290 trap() at trap+0x10a calltrap() at calltrap+0x8 --- trap 0x9, rip = 0x8063c0e4, rsp = 0xff8000329a00, rbp = 0xff8000329a40 --- ether_nh_input() at ether_nh_input+0x94 netisr_dispatch_src() at netisr_dispatch_src+0x20b em_rxeof() at em_rxeof+0x1ca em_msix_rx() at em_msix_rx+0x24 intr_event_execute_handlers() at intr_event_execute_handlers+0x104 ithread_loop() at ithread_loop+0xa4 fork_exit() at fork_exit+0x11f fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xff8000329d00, rbp = 0 --- Uptime: 46s Dumping 657 out of 12263 MB:..3% Does it help if you disable msix on your em0? Google for 'sysctl em msix'. Or run 'sysctl -a | grep msix'. OK, setting hw.em.enable_msix=0 in /boot/loader.conf does NOT help. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Sporadic 9.0-RC2 boot-time panic
On Monday, November 28, 2011 5:37:27 pm Mike Andrews wrote: *Sometimes* when booting 9.0-RC2 on *some* of my machines, I'll get one of the following two panics during multiuser startup, usually while running the /usr/local/etc/rc.d scripts. (The instruction pointer is always exactly one of these two, and they look fairly related.) If after two or three reboots it manages to not panic, the system will run perfectly stable. For some probably-unrelated reason, the dump never finishes in either case. First panic (note em0 warning before it): - em0: discard frame w/o packet header This is odd. I see one bug that could possibly trigger this, but not on x86: Index: if_em.c === --- if_em.c (revision 228074) +++ if_em.c (working copy) @@ -4305,8 +4305,10 @@ em_rxeof(struct rx_ring *rxr, int count, int *done #ifndef __NO_STRICT_ALIGNMENT if (adapter-max_frame_size (MCLBYTES - ETHER_ALIGN) - em_fixup_rx(rxr) != 0) - goto skip; + em_fixup_rx(rxr) != 0) { + sendmp = NULL; + goto next_desc; + } #endif if (status E1000_RXD_STAT_VP) { sendmp-m_pkthdr.ether_vtag = @@ -4318,9 +4320,6 @@ em_rxeof(struct rx_ring *rxr, int count, int *done sendmp-m_pkthdr.flowid = rxr-msix; sendmp-m_flags |= M_FLOWID; #endif -#ifndef __NO_STRICT_ALIGNMENT -skip: -#endif rxr-fmp = rxr-lmp = NULL; } next_desc: @@ -4426,6 +4425,7 @@ em_fixup_rx(struct rx_ring *rxr) adapter-dropped_pkts++; m_freem(rxr-fmp); rxr-fmp = NULL; + rxr-lmp = NULL; error = ENOMEM; } } -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Sporadic 9.0-RC2 boot-time panic
On 11/29/2011 10:50 AM, John Baldwin wrote: On Monday, November 28, 2011 5:37:27 pm Mike Andrews wrote: *Sometimes* when booting 9.0-RC2 on *some* of my machines, I'll get one of the following two panics during multiuser startup, usually while running the /usr/local/etc/rc.d scripts. (The instruction pointer is always exactly one of these two, and they look fairly related.) If after two or three reboots it manages to not panic, the system will run perfectly stable. For some probably-unrelated reason, the dump never finishes in either case. First panic (note em0 warning before it): - em0: discard frame w/o packet header This is odd. I see one bug that could possibly trigger this, but not on x86: This is amd64, which of course depending on what you meant by not on x86 may or may not be the same thing ;-) This is with RELENG_9_0 sources built yesterday morning (Nov 28). Kernel config's reasonably close to GENERIC with many unused drivers removed. Hardware is Supermicro X8STi-F -- we do have other (older) systems we haven't yet tried upgrading that have slightly different em revs -- maybe I'll try one of those today just to see if it's 82574L specific. em0: Intel(R) PRO/1000 Network Connection 7.2.3 port 0xdc00-0xdc1f mem 0xfbce-0xfbcf,0xfbcdc000-0xfbcd irq 16 at device 0.0 on pci1 em0: Using MSIX interrupts with 3 vectors em0: Ethernet address: 00:25:90:xx:xx:xx em1: Intel(R) PRO/1000 Network Connection 7.2.3 port 0xec00-0xec1f mem 0xfbde-0xfbdf,0xfbddc000-0xfbdd irq 16 at device 0.0 on pci2 em1: Using MSIX interrupts with 3 vectors em1: Ethernet address: 00:25:90:xx:xx:xx em0@pci0:1:0:0: class=0x02 card=0x10d315d9 chip=0x10d38086 rev=0x00 hdr=0x00 vendor = 'Intel Corporation' device = '82574L Gigabit Network Connection' class = network subclass = ethernet bar [10] = type Memory, range 32, base 0xfbce, size 131072, enabled bar [18] = type I/O Port, range 32, base 0xdc00, size 32, enabled bar [1c] = type Memory, range 32, base 0xfbcdc000, size 16384, enabled cap 01[c8] = powerspec 2 supports D0 D3 current D0 cap 05[d0] = MSI supports 1 message, 64 bit cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x1(x1) cap 11[a0] = MSI-X supports 5 messages in map 0x1c enabled ecap 0001[100] = AER 1 0 fatal 0 non-fatal 1 corrected em1@pci0:2:0:0: class=0x02 card=0x10d315d9 chip=0x10d38086 rev=0x00 hdr=0x00 vendor = 'Intel Corporation' device = '82574L Gigabit Network Connection' class = network subclass = ethernet bar [10] = type Memory, range 32, base 0xfbde, size 131072, enabled bar [18] = type I/O Port, range 32, base 0xec00, size 32, enabled bar [1c] = type Memory, range 32, base 0xfbddc000, size 16384, enabled cap 01[c8] = powerspec 2 supports D0 D3 current D0 cap 05[d0] = MSI supports 1 message, 64 bit cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x1(x1) cap 11[a0] = MSI-X supports 5 messages in map 0x1c enabled ecap 0001[100] = AER 1 0 fatal 0 non-fatal 1 corrected Index: if_em.c === --- if_em.c (revision 228074) +++ if_em.c (working copy) @@ -4305,8 +4305,10 @@ em_rxeof(struct rx_ring *rxr, int count, int *done #ifndef __NO_STRICT_ALIGNMENT if (adapter-max_frame_size (MCLBYTES - ETHER_ALIGN) - em_fixup_rx(rxr) != 0) - goto skip; + em_fixup_rx(rxr) != 0) { + sendmp = NULL; + goto next_desc; + } #endif if (status E1000_RXD_STAT_VP) { sendmp-m_pkthdr.ether_vtag = @@ -4318,9 +4320,6 @@ em_rxeof(struct rx_ring *rxr, int count, int *done sendmp-m_pkthdr.flowid = rxr-msix; sendmp-m_flags |= M_FLOWID; #endif -#ifndef __NO_STRICT_ALIGNMENT -skip: -#endif rxr-fmp = rxr-lmp = NULL; } next_desc: @@ -4426,6 +4425,7 @@ em_fixup_rx(struct rx_ring *rxr) adapter-dropped_pkts++; m_freem(rxr-fmp); rxr-fmp = NULL; + rxr-lmp = NULL; error = ENOMEM; } } ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Sporadic 9.0-RC2 boot-time panic
On Tuesday, November 29, 2011 3:27:43 pm Mike Andrews wrote: On 11/29/2011 10:50 AM, John Baldwin wrote: On Monday, November 28, 2011 5:37:27 pm Mike Andrews wrote: *Sometimes* when booting 9.0-RC2 on *some* of my machines, I'll get one of the following two panics during multiuser startup, usually while running the /usr/local/etc/rc.d scripts. (The instruction pointer is always exactly one of these two, and they look fairly related.) If after two or three reboots it manages to not panic, the system will run perfectly stable. For some probably-unrelated reason, the dump never finishes in either case. First panic (note em0 warning before it): - em0: discard frame w/o packet header This is odd. I see one bug that could possibly trigger this, but not on x86: This is amd64, which of course depending on what you meant by not on x86 may or may not be the same thing ;-) x86 == (amd64 | i386) :) -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Sporadic 9.0-RC2 boot-time panic
On Mon, 28 Nov 2011 23:37:27 +0100, Mike Andrews mandr...@bit0.com wrote: *Sometimes* when booting 9.0-RC2 on *some* of my machines, I'll get one of the following two panics during multiuser startup, usually while running the /usr/local/etc/rc.d scripts. (The instruction pointer is always exactly one of these two, and they look fairly related.) If after two or three reboots it manages to not panic, the system will run perfectly stable. For some probably-unrelated reason, the dump never finishes in either case. First panic (note em0 warning before it): - em0: discard frame w/o packet header Fatal trap 9: general protection fault while in kernel mode cpuid = 0; apic id = 00 instruction pointer = 0x20:0x805e4fc5 stack pointer = 0x28:0xff80003299e0 frame pointer = 0x28:0xff8000329a00 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 12 (irq256: em0:rx 0) trap number = 9 panic: general protection fault cpuid = 0 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a kdb_backtrace() at kdb_backtrace+0x37 panic() at panic+0x187 trap_fatal() at trap_fatal+0x290 trap() at trap+0x10a calltrap() at calltrap+0x8 --- trap 0x9, rip = 0x805e4fc5, rsp = 0xff80003299e0, rbp = 0xff8000329a00 --- m_freem() at m_freem+0x25 ether_nh_input() at ether_nh_input+0x82 netisr_dispatch_src() at netisr_dispatch_src+0x20b em_rxeof() at em_rxeof+0x1ca em_msix_rx() at em_msix_rx+0x24 intr_event_execute_handlers() at intr_event_execute_handlers+0x104 ithread_loop() at ithread_loop+0xa4 fork_exit() at fork_exit+0x11f fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xff8000329d00, rbp = 0 --- Uptime: 49s Dumping 679 out of 12263 MB: - Second panic (no em0 discard warning this time): - Fatal trap 9: general protection fault while in kernel mode cpuid = 0; apic id = 00 instruction pointer = 0x20:0x8063c0e4 stack pointer = 0x28:0xff8000329a00 frame pointer = 0x28:0xff8000329a40 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 12 (irq256: em0:rx 0) trap number = 9 panic: general protection fault cpuid = 0 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a kdb_backtrace() at kdb_backtrace+0x37 panic() at panic+0x187 trap_fatal() at trap_fatal+0x290 trap() at trap+0x10a calltrap() at calltrap+0x8 --- trap 0x9, rip = 0x8063c0e4, rsp = 0xff8000329a00, rbp = 0xff8000329a40 --- ether_nh_input() at ether_nh_input+0x94 netisr_dispatch_src() at netisr_dispatch_src+0x20b em_rxeof() at em_rxeof+0x1ca em_msix_rx() at em_msix_rx+0x24 intr_event_execute_handlers() at intr_event_execute_handlers+0x104 ithread_loop() at ithread_loop+0xa4 fork_exit() at fork_exit+0x11f fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xff8000329d00, rbp = 0 --- Uptime: 46s Dumping 657 out of 12263 MB:..3% ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org Does it help if you disable msix on your em0? Google for 'sysctl em msix'. Or run 'sysctl -a | grep msix'. NB: I know nothing about the details of em of msix, so hopefully somebody with more clue responds also. Ronald. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Sporadic 9.0-RC2 boot-time panic
On Mon, Nov 28, 2011 at 05:37:27PM -0500, Mike Andrews wrote: *Sometimes* when booting 9.0-RC2 on *some* of my machines, I'll get one of the following two panics during multiuser startup, usually while running the /usr/local/etc/rc.d scripts. (The instruction pointer is always exactly one of these two, and they look fairly related.) If after two or three reboots it manages to not panic, the system will run perfectly stable. For some probably-unrelated reason, the dump never finishes in either case. First panic (note em0 warning before it): - em0: discard frame w/o packet header Fatal trap 9: general protection fault while in kernel mode cpuid = 0; apic id = 00 instruction pointer = 0x20:0x805e4fc5 stack pointer = 0x28:0xff80003299e0 frame pointer = 0x28:0xff8000329a00 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 12 (irq256: em0:rx 0) trap number = 9 panic: general protection fault cpuid = 0 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a kdb_backtrace() at kdb_backtrace+0x37 panic() at panic+0x187 trap_fatal() at trap_fatal+0x290 trap() at trap+0x10a calltrap() at calltrap+0x8 --- trap 0x9, rip = 0x805e4fc5, rsp = 0xff80003299e0, rbp = 0xff8000329a00 --- m_freem() at m_freem+0x25 ether_nh_input() at ether_nh_input+0x82 netisr_dispatch_src() at netisr_dispatch_src+0x20b em_rxeof() at em_rxeof+0x1ca em_msix_rx() at em_msix_rx+0x24 intr_event_execute_handlers() at intr_event_execute_handlers+0x104 ithread_loop() at ithread_loop+0xa4 fork_exit() at fork_exit+0x11f fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xff8000329d00, rbp = 0 --- Uptime: 49s Dumping 679 out of 12263 MB: - Second panic (no em0 discard warning this time): - Fatal trap 9: general protection fault while in kernel mode cpuid = 0; apic id = 00 instruction pointer = 0x20:0x8063c0e4 stack pointer = 0x28:0xff8000329a00 frame pointer = 0x28:0xff8000329a40 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 12 (irq256: em0:rx 0) trap number = 9 panic: general protection fault cpuid = 0 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a kdb_backtrace() at kdb_backtrace+0x37 panic() at panic+0x187 trap_fatal() at trap_fatal+0x290 trap() at trap+0x10a calltrap() at calltrap+0x8 --- trap 0x9, rip = 0x8063c0e4, rsp = 0xff8000329a00, rbp = 0xff8000329a40 --- ether_nh_input() at ether_nh_input+0x94 netisr_dispatch_src() at netisr_dispatch_src+0x20b em_rxeof() at em_rxeof+0x1ca em_msix_rx() at em_msix_rx+0x24 intr_event_execute_handlers() at intr_event_execute_handlers+0x104 ithread_loop() at ithread_loop+0xa4 fork_exit() at fork_exit+0x11f fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xff8000329d00, rbp = 0 --- Uptime: 46s Dumping 657 out of 12263 MB:..3% We need the following things: * uname -a output * dmesg output (only details specific to emX NICs please) * pciconf -lvcb output (only details specific to emX NICs please) CC'ing Jack Vogel (driver author) who can hopefully shed some light on this. -- | Jeremy Chadwickjdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org