Re: Patch available for shared em interrupts (Re: em, bge, network problems survey.)
On Thu, Oct 05, 2006 at 10:34:25PM -0400, Kris Kennaway wrote: Please let Scott and I know whether or not this patch works for you (in addition to the information previously requested, if you have not already sent it). Unfortunately it is only a workaround, but it points to an underlying problem with fast interrupt handlers on a shared irq that can be studied separately. I'm a bit behind in mailing list traffic (700 unread in -stable, yikes!). I can confirm that this works around the problem for me. It also seems to prevent the USB controller the irq is shared with from locking up as well. Craig ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Patch available for shared em interrupts (Re: em, bge, network problems survey.)
At 10:34 PM 10/5/2006, Kris Kennaway wrote: Based on successful testing on a machine with shared em interrupt, the following patch should work around the problem *in that case*. Note that this patch will not help you if you are not using the em driver, or if you are seeing the problem with non-shared em interrupt (I have investigated on such outlier, which seems to be a problem with a particular model of em hardware and not a generic problem with the driver). Please let Scott and I know whether or not this patch works for you (in addition to the information previously requested, if you have not already sent it). Unfortunately it is only a workaround, but it points to an underlying problem with fast interrupt handlers on a shared irq that can be studied separately. I ran into a em0 timeout on a box I just started testing. The patch seems to fix the issue. (before the patch) Oct 13 21:42:56 am64 kernel: em0: watchdog timeout -- resetting Oct 13 21:42:56 am64 kernel: em0: link state changed to DOWN Oct 13 21:42:58 am64 kernel: em0: link state changed to UP dmesg with patch Copyright (c) 1992-2006 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 6.2-PRERELEASE #2: Fri Oct 13 22:28:38 EDT 2006 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/up ACPI APIC Table: A M I OEMAPIC Timecounter i8254 frequency 1193182 Hz quality 0 CPU: Intel(R) Pentium(R) 4 CPU 3.00GHz (2992.71-MHz K8-class CPU) Origin = GenuineIntel Id = 0xf43 Stepping = 3 Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE Features2=0x649dSSE3,RSVD2,MON,DS_CPL,EST,CNTX-ID,CX16,b14 AMD Features=0x2800SYSCALL,LM Logical CPUs per core: 2 real memory = 3481198592 (3319 MB) avail memory = 3360186368 (3204 MB) ioapic0 Version 2.0 irqs 0-23 on motherboard ioapic1 Version 2.0 irqs 24-47 on motherboard ioapic2 Version 2.0 irqs 48-71 on motherboard kbd1 at kbdmux0 acpi0: A M I 7221BK1E on motherboard acpi_bus_number: can't get _ADR acpi_bus_number: can't get _ADR acpi0: Power Button (fixed) acpi0: reservation of 500, 10 (4) failed acpi0: reservation of 560, 20 (4) failed Timecounter ACPI-safe frequency 3579545 Hz quality 1000 acpi_timer0: 24-bit timer at 3.579545MHz port 0x808-0x80b on acpi0 cpu0: ACPI CPU on acpi0 acpi_throttle0: ACPI CPU Throttling on cpu0 pcib0: ACPI Host-PCI bridge port 0xcf8-0xcff on acpi0 pci0: ACPI PCI bus on pcib0 pci0: display, VGA at device 2.0 (no driver attached) pcib1: ACPI PCI-PCI bridge irq 16 at device 28.0 on pci0 pci2: ACPI PCI bus on pcib1 pcib2: ACPI PCI-PCI bridge at device 0.0 on pci2 pci4: ACPI PCI bus on pcib2 pcib3: ACPI PCI-PCI bridge at device 0.2 on pci2 pci3: ACPI PCI bus on pcib3 3ware device driver for 9000 series storage controllers, version: 3.60.02.012 twa0: 3ware 9000 series Storage Controller port 0xef80-0xefbf mem 0xfebff000-0xfebf irq 53 at device 2.0 on pci3 twa0: [GIANT-LOCKED] twa0: INFO: (0x15: 0x1300): Controller details:: Model 9550SX-4LP, 4 ports, Firmware FE9X 3.01.01.028, BIOS BE9X 3.01.00.024 uhci0: Intel 82801FB/FR/FW/FRW (ICH6) USB controller USB-A port 0xcc00-0xcc1f irq 23 at device 29.0 on pci0 uhci0: [GIANT-LOCKED] usb0: Intel 82801FB/FR/FW/FRW (ICH6) USB controller USB-A on uhci0 usb0: USB revision 1.0 uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub0: 2 ports with 2 removable, self powered uhci1: Intel 82801FB/FR/FW/FRW (ICH6) USB controller USB-B port 0xcc80-0xcc9f irq 19 at device 29.1 on pci0 uhci1: [GIANT-LOCKED] usb1: Intel 82801FB/FR/FW/FRW (ICH6) USB controller USB-B on uhci1 usb1: USB revision 1.0 uhub1: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub1: 2 ports with 2 removable, self powered uhci2: Intel 82801FB/FR/FW/FRW (ICH6) USB controller USB-C port 0xcd00-0xcd1f irq 18 at device 29.2 on pci0 uhci2: [GIANT-LOCKED] usb2: Intel 82801FB/FR/FW/FRW (ICH6) USB controller USB-C on uhci2 usb2: USB revision 1.0 uhub2: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub2: 2 ports with 2 removable, self powered ehci0: Intel 82801FB (ICH6) USB 2.0 controller mem 0xfe9ff800-0xfe9ffbff irq 23 at device 29.7 on pci0 ehci0: [GIANT-LOCKED] usb3: EHCI version 1.0 usb3: companion controllers, 2 ports each: usb0 usb1 usb2 usb3: Intel 82801FB (ICH6) USB 2.0 controller on ehci0 usb3: USB revision 2.0 uhub3: Intel EHCI root hub, class 9/0, rev 2.00/1.00, addr 1 uhub3: 6 ports with 6 removable, self powered pcib4: ACPI PCI-PCI bridge at device 30.0 on pci0 pci1: ACPI PCI bus on pcib4 em0: Intel(R) PRO/1000 Network Connection Version - 6.1.4 port 0xdf80-0xdfbf mem 0xfeae-0xfeaf irq 18 at device 3.0 on pci1 em0: Ethernet address: 00:0e:0c:4b:15:eb isab0: PCI-ISA bridge at device 31.0 on pci0 isa0: ISA bus on isab0 atapci0: Intel ICH6 UDMA100
Re: Patch available for shared em interrupts (Re: em, bge, network problems survey.)
At 12:31 AM 10/14/2006, Scott Long wrote: Mike, I have a new patch that I hope addresses the actual bug, instead of shuffling the timing. Would you be willing to test it? I can't guarantee that it's safe for production use yet, though. It seems to work, but it might set your dog on fire too. Yes, for sure as the box is just for testing mysql right now. I dont think we will end up even using it in production as the whole MB runs insanely hot. ---Mike ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Patch available for shared em interrupts (Re: em, bge, network problems survey.)
Mike Tancsa wrote: At 10:34 PM 10/5/2006, Kris Kennaway wrote: Based on successful testing on a machine with shared em interrupt, the following patch should work around the problem *in that case*. Note that this patch will not help you if you are not using the em driver, or if you are seeing the problem with non-shared em interrupt (I have investigated on such outlier, which seems to be a problem with a particular model of em hardware and not a generic problem with the driver). Please let Scott and I know whether or not this patch works for you (in addition to the information previously requested, if you have not already sent it). Unfortunately it is only a workaround, but it points to an underlying problem with fast interrupt handlers on a shared irq that can be studied separately. I ran into a em0 timeout on a box I just started testing. The patch seems to fix the issue. (before the patch) Oct 13 21:42:56 am64 kernel: em0: watchdog timeout -- resetting Oct 13 21:42:56 am64 kernel: em0: link state changed to DOWN Oct 13 21:42:58 am64 kernel: em0: link state changed to UP dmesg with patch Copyright (c) 1992-2006 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 6.2-PRERELEASE #2: Fri Oct 13 22:28:38 EDT 2006 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/up ACPI APIC Table: A M I OEMAPIC Timecounter i8254 frequency 1193182 Hz quality 0 CPU: Intel(R) Pentium(R) 4 CPU 3.00GHz (2992.71-MHz K8-class CPU) Origin = GenuineIntel Id = 0xf43 Stepping = 3 Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE Features2=0x649dSSE3,RSVD2,MON,DS_CPL,EST,CNTX-ID,CX16,b14 AMD Features=0x2800SYSCALL,LM Logical CPUs per core: 2 real memory = 3481198592 (3319 MB) avail memory = 3360186368 (3204 MB) ioapic0 Version 2.0 irqs 0-23 on motherboard ioapic1 Version 2.0 irqs 24-47 on motherboard ioapic2 Version 2.0 irqs 48-71 on motherboard kbd1 at kbdmux0 acpi0: A M I 7221BK1E on motherboard acpi_bus_number: can't get _ADR acpi_bus_number: can't get _ADR acpi0: Power Button (fixed) acpi0: reservation of 500, 10 (4) failed acpi0: reservation of 560, 20 (4) failed Timecounter ACPI-safe frequency 3579545 Hz quality 1000 acpi_timer0: 24-bit timer at 3.579545MHz port 0x808-0x80b on acpi0 cpu0: ACPI CPU on acpi0 acpi_throttle0: ACPI CPU Throttling on cpu0 pcib0: ACPI Host-PCI bridge port 0xcf8-0xcff on acpi0 pci0: ACPI PCI bus on pcib0 pci0: display, VGA at device 2.0 (no driver attached) pcib1: ACPI PCI-PCI bridge irq 16 at device 28.0 on pci0 pci2: ACPI PCI bus on pcib1 pcib2: ACPI PCI-PCI bridge at device 0.0 on pci2 pci4: ACPI PCI bus on pcib2 pcib3: ACPI PCI-PCI bridge at device 0.2 on pci2 pci3: ACPI PCI bus on pcib3 3ware device driver for 9000 series storage controllers, version: 3.60.02.012 twa0: 3ware 9000 series Storage Controller port 0xef80-0xefbf mem 0xfebff000-0xfebf irq 53 at device 2.0 on pci3 twa0: [GIANT-LOCKED] twa0: INFO: (0x15: 0x1300): Controller details:: Model 9550SX-4LP, 4 ports, Firmware FE9X 3.01.01.028, BIOS BE9X 3.01.00.024 uhci0: Intel 82801FB/FR/FW/FRW (ICH6) USB controller USB-A port 0xcc00-0xcc1f irq 23 at device 29.0 on pci0 uhci0: [GIANT-LOCKED] usb0: Intel 82801FB/FR/FW/FRW (ICH6) USB controller USB-A on uhci0 usb0: USB revision 1.0 uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub0: 2 ports with 2 removable, self powered uhci1: Intel 82801FB/FR/FW/FRW (ICH6) USB controller USB-B port 0xcc80-0xcc9f irq 19 at device 29.1 on pci0 uhci1: [GIANT-LOCKED] usb1: Intel 82801FB/FR/FW/FRW (ICH6) USB controller USB-B on uhci1 usb1: USB revision 1.0 uhub1: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub1: 2 ports with 2 removable, self powered uhci2: Intel 82801FB/FR/FW/FRW (ICH6) USB controller USB-C port 0xcd00-0xcd1f irq 18 at device 29.2 on pci0 uhci2: [GIANT-LOCKED] usb2: Intel 82801FB/FR/FW/FRW (ICH6) USB controller USB-C on uhci2 usb2: USB revision 1.0 uhub2: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub2: 2 ports with 2 removable, self powered ehci0: Intel 82801FB (ICH6) USB 2.0 controller mem 0xfe9ff800-0xfe9ffbff irq 23 at device 29.7 on pci0 ehci0: [GIANT-LOCKED] usb3: EHCI version 1.0 usb3: companion controllers, 2 ports each: usb0 usb1 usb2 usb3: Intel 82801FB (ICH6) USB 2.0 controller on ehci0 usb3: USB revision 2.0 uhub3: Intel EHCI root hub, class 9/0, rev 2.00/1.00, addr 1 uhub3: 6 ports with 6 removable, self powered pcib4: ACPI PCI-PCI bridge at device 30.0 on pci0 pci1: ACPI PCI bus on pcib4 em0: Intel(R) PRO/1000 Network Connection Version - 6.1.4 port 0xdf80-0xdfbf mem 0xfeae-0xfeaf irq 18 at device 3.0 on pci1 em0: Ethernet address: 00:0e:0c:4b:15:eb isab0: PCI-ISA bridge at device 31.0 on pci0 isa0: ISA bus on isab0 atapci0:
Re: Patch available for shared em interrupts (Re: em, bge, network problems survey.)
On 6. okt. 2006, at 04.34, Kris Kennaway wrote: On Thu, Oct 05, 2006 at 04:05:52PM -0400, Kris Kennaway wrote: On Wed, Oct 04, 2006 at 05:14:27PM -0600, Scott Long wrote: All, I'm seeing some patterns here with all of the network driver problem reports, but I need more information to help narrow it down further. I ask all of you who are having problems to take a minute to fill out this survey and return it to Kris Kennaway (on cc:) and myself. Thanks. 1. Are you experiencing network hangs and/or timeout messages on the console? If yes, please provide a _brief_ description of the problem. OK, next question, to all em users: If your em device is using a shared interrupt, and you are NOT experiencing timeout problems when using this device, please let me know: Based on successful testing on a machine with shared em interrupt, the following patch should work around the problem *in that case*. Note that this patch will not help you if you are not using the em driver, or if you are seeing the problem with non-shared em interrupt (I have investigated on such outlier, which seems to be a problem with a particular model of em hardware and not a generic problem with the driver). Index: if_em.c === RCS file: /home/ncvs/src/sys/dev/em/if_em.c,v retrieving revision 1.65.2.18 diff -u -u -r1.65.2.18 if_em.c --- if_em.c 25 Aug 2006 12:38:26 - 1.65.2.18 +++ if_em.c 5 Oct 2006 22:05:45 - @@ -2086,7 +2086,7 @@ taskqueue_start_threads(adapter-tq, 1, PI_NET, %s taskq, device_get_nameunit(adapter-dev)); if ((error = bus_setup_intr(dev, adapter-res_interrupt, - INTR_TYPE_NET | INTR_FAST, em_intr_fast, adapter, + INTR_TYPE_NET | INTR_MPSAFE, em_intr_fast, adapter, adapter-int_handler_tag)) != 0) { device_printf(dev, Failed to register fast interrupt handler: %d\n, error); Please let Scott and I know whether or not this patch works for you (in addition to the information previously requested, if you have not already sent it). Unfortunately it is only a workaround, but it points to an underlying problem with fast interrupt handlers on a shared irq that can be studied separately. I tested this on one of my other systems where em0 and USB shares an interrupt, and the patch helps to remove the watchdog timeout, and makes the system usable. Without it the system will some times not come up successfully at all, and other times it will drop off the face of the earth as soon as some network I/O in combination with disk I/O is done. -- Frode Nordahl ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Patch available for shared em interrupts (Re: em, bge, network problems survey.)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Oct 5, 2006, at 19:34 , Kris Kennaway wrote: Based on successful testing on a machine with shared em interrupt, the following patch should work around the problem *in that case*. This solves the em(4) issue for me on a shared interrupt. Prior to this, the network hang (no watchdog timeouts) was trivially reproducible with an NFS-mounted FreeBSD repository to two builder boxes, and running cvs -q upd on the ports tree at the same time. (the builder boxes also have em(4) interfaces, which I haven't patched, but they're running 7.0-CURRENT). Everything is i386. [EMAIL PROTECTED]:/dtbox] 739# vmstat -i ... irq21: em0 acpi0 965426857 ... - -aDe -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.3 (Darwin) iD8DBQFFKexJpXS8U0IvffwRArroAKCR69boUDor2t+L9rXsYXpoYsQkEQCeIcYg pSAbtbu28DAUE+EbOJUmIk8= =NbgC -END PGP SIGNATURE- ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Patch available for shared em interrupts (Re: em, bge, network problems survey.)
Kris Kennaway ([EMAIL PROTECTED]) on 05/10/2006 at 22:34 wrote: Based on successful testing on a machine with shared em interrupt, the following patch should work around the problem *in that case*. [...] Please let Scott and I know whether or not this patch works for you (in addition to the information previously requested, if you have not already sent it). Unfortunately it is only a workaround, but it points to an underlying problem with fast interrupt handlers on a shared irq that can be studied separately. # mojito uptime 14:23 up 1:59, 4 users, load averages: 0,07 0,05 0,01 # mojito uname -v FreeBSD 6.2-PRERELEASE #15: Fri Oct 6 12:11:36 CEST 2006 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/DEBUG Your patch fixes my em/nvidia issue. Thanks Kris -- bug ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Patch available for shared em interrupts (Re: em, bge, network problems survey.)
On Thu, Oct 05, 2006 at 04:05:52PM -0400, Kris Kennaway wrote: On Wed, Oct 04, 2006 at 05:14:27PM -0600, Scott Long wrote: All, I'm seeing some patterns here with all of the network driver problem reports, but I need more information to help narrow it down further. I ask all of you who are having problems to take a minute to fill out this survey and return it to Kris Kennaway (on cc:) and myself. Thanks. 1. Are you experiencing network hangs and/or timeout messages on the console? If yes, please provide a _brief_ description of the problem. OK, next question, to all em users: If your em device is using a shared interrupt, and you are NOT experiencing timeout problems when using this device, please let me know: Based on successful testing on a machine with shared em interrupt, the following patch should work around the problem *in that case*. Note that this patch will not help you if you are not using the em driver, or if you are seeing the problem with non-shared em interrupt (I have investigated on such outlier, which seems to be a problem with a particular model of em hardware and not a generic problem with the driver). Index: if_em.c === RCS file: /home/ncvs/src/sys/dev/em/if_em.c,v retrieving revision 1.65.2.18 diff -u -u -r1.65.2.18 if_em.c --- if_em.c 25 Aug 2006 12:38:26 - 1.65.2.18 +++ if_em.c 5 Oct 2006 22:05:45 - @@ -2086,7 +2086,7 @@ taskqueue_start_threads(adapter-tq, 1, PI_NET, %s taskq, device_get_nameunit(adapter-dev)); if ((error = bus_setup_intr(dev, adapter-res_interrupt, - INTR_TYPE_NET | INTR_FAST, em_intr_fast, adapter, + INTR_TYPE_NET | INTR_MPSAFE, em_intr_fast, adapter, adapter-int_handler_tag)) != 0) { device_printf(dev, Failed to register fast interrupt handler: %d\n, error); Please let Scott and I know whether or not this patch works for you (in addition to the information previously requested, if you have not already sent it). Unfortunately it is only a workaround, but it points to an underlying problem with fast interrupt handlers on a shared irq that can be studied separately. Kris pgpp54QFa2jMW.pgp Description: PGP signature