Re: Patch available for shared em interrupts (Re: em, bge, network problems survey.)

2006-10-27 Thread Craig Boston
On Thu, Oct 05, 2006 at 10:34:25PM -0400, Kris Kennaway wrote:
 Please let Scott and I know whether or not this patch works for you
 (in addition to the information previously requested, if you have not
 already sent it).  Unfortunately it is only a workaround, but it
 points to an underlying problem with fast interrupt handlers on a
 shared irq that can be studied separately.

I'm a bit behind in mailing list traffic (700 unread in -stable,
yikes!).  I can confirm that this works around the problem for me.  It
also seems to prevent the USB controller the irq is shared with from
locking up as well.

Craig
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Patch available for shared em interrupts (Re: em, bge, network problems survey.)

2006-10-13 Thread Mike Tancsa

At 10:34 PM 10/5/2006, Kris Kennaway wrote:


Based on successful testing on a machine with shared em interrupt, the
following patch should work around the problem *in that case*.

Note that this patch will not help you if you are not using the em
driver, or if you are seeing the problem with non-shared em interrupt
(I have investigated on such outlier, which seems to be a problem with
a particular model of em hardware and not a generic problem with the
driver).

Please let Scott and I know whether or not this patch works for you
(in addition to the information previously requested, if you have not
already sent it).  Unfortunately it is only a workaround, but it
points to an underlying problem with fast interrupt handlers on a
shared irq that can be studied separately.


I ran into a em0 timeout on a box I just started testing. The patch 
seems to fix the issue.

(before the patch)
Oct 13 21:42:56 am64 kernel: em0: watchdog timeout -- resetting
Oct 13 21:42:56 am64 kernel: em0: link state changed to DOWN
Oct 13 21:42:58 am64 kernel: em0: link state changed to UP

dmesg with patch

Copyright (c) 1992-2006 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 6.2-PRERELEASE #2: Fri Oct 13 22:28:38 EDT 2006
[EMAIL PROTECTED]:/usr/obj/usr/src/sys/up
ACPI APIC Table: A M I  OEMAPIC 
Timecounter i8254 frequency 1193182 Hz quality 0
CPU: Intel(R) Pentium(R) 4 CPU 3.00GHz (2992.71-MHz K8-class CPU)
  Origin = GenuineIntel  Id = 0xf43  Stepping = 3
  
Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE
  Features2=0x649dSSE3,RSVD2,MON,DS_CPL,EST,CNTX-ID,CX16,b14
  AMD Features=0x2800SYSCALL,LM
  Logical CPUs per core: 2
real memory  = 3481198592 (3319 MB)
avail memory = 3360186368 (3204 MB)
ioapic0 Version 2.0 irqs 0-23 on motherboard
ioapic1 Version 2.0 irqs 24-47 on motherboard
ioapic2 Version 2.0 irqs 48-71 on motherboard
kbd1 at kbdmux0
acpi0: A M I 7221BK1E on motherboard
acpi_bus_number: can't get _ADR
acpi_bus_number: can't get _ADR
acpi0: Power Button (fixed)
acpi0: reservation of 500, 10 (4) failed
acpi0: reservation of 560, 20 (4) failed
Timecounter ACPI-safe frequency 3579545 Hz quality 1000
acpi_timer0: 24-bit timer at 3.579545MHz port 0x808-0x80b on acpi0
cpu0: ACPI CPU on acpi0
acpi_throttle0: ACPI CPU Throttling on cpu0
pcib0: ACPI Host-PCI bridge port 0xcf8-0xcff on acpi0
pci0: ACPI PCI bus on pcib0
pci0: display, VGA at device 2.0 (no driver attached)
pcib1: ACPI PCI-PCI bridge irq 16 at device 28.0 on pci0
pci2: ACPI PCI bus on pcib1
pcib2: ACPI PCI-PCI bridge at device 0.0 on pci2
pci4: ACPI PCI bus on pcib2
pcib3: ACPI PCI-PCI bridge at device 0.2 on pci2
pci3: ACPI PCI bus on pcib3
3ware device driver for 9000 series storage controllers, version: 3.60.02.012
twa0: 3ware 9000 series Storage Controller port 0xef80-0xefbf mem 
0xfebff000-0xfebf irq 53 at device 2.0 on pci3

twa0: [GIANT-LOCKED]
twa0: INFO: (0x15: 0x1300): Controller details:: Model 9550SX-4LP, 4 
ports, Firmware FE9X 3.01.01.028, BIOS BE9X 3.01.00.024
uhci0: Intel 82801FB/FR/FW/FRW (ICH6) USB controller USB-A port 
0xcc00-0xcc1f irq 23 at device 29.0 on pci0

uhci0: [GIANT-LOCKED]
usb0: Intel 82801FB/FR/FW/FRW (ICH6) USB controller USB-A on uhci0
usb0: USB revision 1.0
uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
uhci1: Intel 82801FB/FR/FW/FRW (ICH6) USB controller USB-B port 
0xcc80-0xcc9f irq 19 at device 29.1 on pci0

uhci1: [GIANT-LOCKED]
usb1: Intel 82801FB/FR/FW/FRW (ICH6) USB controller USB-B on uhci1
usb1: USB revision 1.0
uhub1: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub1: 2 ports with 2 removable, self powered
uhci2: Intel 82801FB/FR/FW/FRW (ICH6) USB controller USB-C port 
0xcd00-0xcd1f irq 18 at device 29.2 on pci0

uhci2: [GIANT-LOCKED]
usb2: Intel 82801FB/FR/FW/FRW (ICH6) USB controller USB-C on uhci2
usb2: USB revision 1.0
uhub2: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub2: 2 ports with 2 removable, self powered
ehci0: Intel 82801FB (ICH6) USB 2.0 controller mem 
0xfe9ff800-0xfe9ffbff irq 23 at device 29.7 on pci0

ehci0: [GIANT-LOCKED]
usb3: EHCI version 1.0
usb3: companion controllers, 2 ports each: usb0 usb1 usb2
usb3: Intel 82801FB (ICH6) USB 2.0 controller on ehci0
usb3: USB revision 2.0
uhub3: Intel EHCI root hub, class 9/0, rev 2.00/1.00, addr 1
uhub3: 6 ports with 6 removable, self powered
pcib4: ACPI PCI-PCI bridge at device 30.0 on pci0
pci1: ACPI PCI bus on pcib4
em0: Intel(R) PRO/1000 Network Connection Version - 6.1.4 port 
0xdf80-0xdfbf mem 0xfeae-0xfeaf irq 18 at device 3.0 on pci1

em0: Ethernet address: 00:0e:0c:4b:15:eb
isab0: PCI-ISA bridge at device 31.0 on pci0
isa0: ISA bus on isab0
atapci0: Intel ICH6 UDMA100 

Re: Patch available for shared em interrupts (Re: em, bge, network problems survey.)

2006-10-13 Thread Mike Tancsa

At 12:31 AM 10/14/2006, Scott Long wrote:


Mike,

I have a new patch that I hope addresses the actual bug, instead of 
shuffling the timing.  Would you be willing to test it?  I can't 
guarantee that it's safe for production use yet, though.  It seems

to work, but it might set your dog on fire too.


Yes, for sure as the box is just for testing mysql right now. I 
dont think we will end up even using it in production as the whole MB 
runs insanely hot.


---Mike 


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Patch available for shared em interrupts (Re: em, bge, network problems survey.)

2006-10-13 Thread Scott Long

Mike Tancsa wrote:

At 10:34 PM 10/5/2006, Kris Kennaway wrote:


Based on successful testing on a machine with shared em interrupt, the
following patch should work around the problem *in that case*.

Note that this patch will not help you if you are not using the em
driver, or if you are seeing the problem with non-shared em interrupt
(I have investigated on such outlier, which seems to be a problem with
a particular model of em hardware and not a generic problem with the
driver).

Please let Scott and I know whether or not this patch works for you
(in addition to the information previously requested, if you have not
already sent it).  Unfortunately it is only a workaround, but it
points to an underlying problem with fast interrupt handlers on a
shared irq that can be studied separately.


I ran into a em0 timeout on a box I just started testing. The patch 
seems to fix the issue.

(before the patch)
Oct 13 21:42:56 am64 kernel: em0: watchdog timeout -- resetting
Oct 13 21:42:56 am64 kernel: em0: link state changed to DOWN
Oct 13 21:42:58 am64 kernel: em0: link state changed to UP

dmesg with patch

Copyright (c) 1992-2006 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 6.2-PRERELEASE #2: Fri Oct 13 22:28:38 EDT 2006
[EMAIL PROTECTED]:/usr/obj/usr/src/sys/up
ACPI APIC Table: A M I  OEMAPIC 
Timecounter i8254 frequency 1193182 Hz quality 0
CPU: Intel(R) Pentium(R) 4 CPU 3.00GHz (2992.71-MHz K8-class CPU)
  Origin = GenuineIntel  Id = 0xf43  Stepping = 3
  
Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE 


  Features2=0x649dSSE3,RSVD2,MON,DS_CPL,EST,CNTX-ID,CX16,b14
  AMD Features=0x2800SYSCALL,LM
  Logical CPUs per core: 2
real memory  = 3481198592 (3319 MB)
avail memory = 3360186368 (3204 MB)
ioapic0 Version 2.0 irqs 0-23 on motherboard
ioapic1 Version 2.0 irqs 24-47 on motherboard
ioapic2 Version 2.0 irqs 48-71 on motherboard
kbd1 at kbdmux0
acpi0: A M I 7221BK1E on motherboard
acpi_bus_number: can't get _ADR
acpi_bus_number: can't get _ADR
acpi0: Power Button (fixed)
acpi0: reservation of 500, 10 (4) failed
acpi0: reservation of 560, 20 (4) failed
Timecounter ACPI-safe frequency 3579545 Hz quality 1000
acpi_timer0: 24-bit timer at 3.579545MHz port 0x808-0x80b on acpi0
cpu0: ACPI CPU on acpi0
acpi_throttle0: ACPI CPU Throttling on cpu0
pcib0: ACPI Host-PCI bridge port 0xcf8-0xcff on acpi0
pci0: ACPI PCI bus on pcib0
pci0: display, VGA at device 2.0 (no driver attached)
pcib1: ACPI PCI-PCI bridge irq 16 at device 28.0 on pci0
pci2: ACPI PCI bus on pcib1
pcib2: ACPI PCI-PCI bridge at device 0.0 on pci2
pci4: ACPI PCI bus on pcib2
pcib3: ACPI PCI-PCI bridge at device 0.2 on pci2
pci3: ACPI PCI bus on pcib3
3ware device driver for 9000 series storage controllers, version: 
3.60.02.012
twa0: 3ware 9000 series Storage Controller port 0xef80-0xefbf mem 
0xfebff000-0xfebf irq 53 at device 2.0 on pci3

twa0: [GIANT-LOCKED]
twa0: INFO: (0x15: 0x1300): Controller details:: Model 9550SX-4LP, 4 
ports, Firmware FE9X 3.01.01.028, BIOS BE9X 3.01.00.024
uhci0: Intel 82801FB/FR/FW/FRW (ICH6) USB controller USB-A port 
0xcc00-0xcc1f irq 23 at device 29.0 on pci0

uhci0: [GIANT-LOCKED]
usb0: Intel 82801FB/FR/FW/FRW (ICH6) USB controller USB-A on uhci0
usb0: USB revision 1.0
uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
uhci1: Intel 82801FB/FR/FW/FRW (ICH6) USB controller USB-B port 
0xcc80-0xcc9f irq 19 at device 29.1 on pci0

uhci1: [GIANT-LOCKED]
usb1: Intel 82801FB/FR/FW/FRW (ICH6) USB controller USB-B on uhci1
usb1: USB revision 1.0
uhub1: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub1: 2 ports with 2 removable, self powered
uhci2: Intel 82801FB/FR/FW/FRW (ICH6) USB controller USB-C port 
0xcd00-0xcd1f irq 18 at device 29.2 on pci0

uhci2: [GIANT-LOCKED]
usb2: Intel 82801FB/FR/FW/FRW (ICH6) USB controller USB-C on uhci2
usb2: USB revision 1.0
uhub2: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub2: 2 ports with 2 removable, self powered
ehci0: Intel 82801FB (ICH6) USB 2.0 controller mem 
0xfe9ff800-0xfe9ffbff irq 23 at device 29.7 on pci0

ehci0: [GIANT-LOCKED]
usb3: EHCI version 1.0
usb3: companion controllers, 2 ports each: usb0 usb1 usb2
usb3: Intel 82801FB (ICH6) USB 2.0 controller on ehci0
usb3: USB revision 2.0
uhub3: Intel EHCI root hub, class 9/0, rev 2.00/1.00, addr 1
uhub3: 6 ports with 6 removable, self powered
pcib4: ACPI PCI-PCI bridge at device 30.0 on pci0
pci1: ACPI PCI bus on pcib4
em0: Intel(R) PRO/1000 Network Connection Version - 6.1.4 port 
0xdf80-0xdfbf mem 0xfeae-0xfeaf irq 18 at device 3.0 on pci1

em0: Ethernet address: 00:0e:0c:4b:15:eb
isab0: PCI-ISA bridge at device 31.0 on pci0
isa0: ISA bus on isab0
atapci0: 

Re: Patch available for shared em interrupts (Re: em, bge, network problems survey.)

2006-10-10 Thread Frode Nordahl

On 6. okt. 2006, at 04.34, Kris Kennaway wrote:


On Thu, Oct 05, 2006 at 04:05:52PM -0400, Kris Kennaway wrote:

On Wed, Oct 04, 2006 at 05:14:27PM -0600, Scott Long wrote:

All,

I'm seeing some patterns here with all of the network driver problem
reports, but I need more information to help narrow it down further.
I ask all of you who are having problems to take a minute to fill
out this survey and return it to Kris Kennaway (on cc:) and myself.
Thanks.

1. Are you experiencing network hangs and/or timeout messages  
on the
console?  If yes, please provide a _brief_ description of the  
problem.


OK, next question, to all em users:

If your em device is using a shared interrupt, and you are NOT
experiencing timeout problems when using this device, please let me
know:


Based on successful testing on a machine with shared em interrupt, the
following patch should work around the problem *in that case*.

Note that this patch will not help you if you are not using the em
driver, or if you are seeing the problem with non-shared em interrupt
(I have investigated on such outlier, which seems to be a problem with
a particular model of em hardware and not a generic problem with the
driver).

Index: if_em.c
===
RCS file: /home/ncvs/src/sys/dev/em/if_em.c,v
retrieving revision 1.65.2.18
diff -u -u -r1.65.2.18 if_em.c
--- if_em.c 25 Aug 2006 12:38:26 -  1.65.2.18
+++ if_em.c 5 Oct 2006 22:05:45 -
@@ -2086,7 +2086,7 @@
taskqueue_start_threads(adapter-tq, 1, PI_NET, %s taskq,
device_get_nameunit(adapter-dev));
if ((error = bus_setup_intr(dev, adapter-res_interrupt,
-   INTR_TYPE_NET | INTR_FAST, em_intr_fast, adapter,
+   INTR_TYPE_NET | INTR_MPSAFE, em_intr_fast, adapter,
adapter-int_handler_tag)) != 0) {
device_printf(dev, Failed to register fast interrupt 
handler: %d\n, error);

Please let Scott and I know whether or not this patch works for you
(in addition to the information previously requested, if you have not
already sent it).  Unfortunately it is only a workaround, but it
points to an underlying problem with fast interrupt handlers on a
shared irq that can be studied separately.


I tested this on one of my other systems where em0 and USB shares an  
interrupt, and the patch helps to remove the watchdog timeout, and  
makes the system usable.


Without it  the system will some times not come up successfully at  
all, and other times it will drop off the face of the earth as soon  
as some network I/O in combination with disk I/O is done.


--
Frode Nordahl



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Patch available for shared em interrupts (Re: em, bge, network problems survey.)

2006-10-09 Thread Ade Lovett

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


On Oct 5, 2006, at 19:34 , Kris Kennaway wrote:

Based on successful testing on a machine with shared em interrupt, the
following patch should work around the problem *in that case*.


This solves the em(4) issue for me on a shared interrupt.  Prior to  
this, the network hang (no watchdog timeouts) was trivially  
reproducible with an NFS-mounted FreeBSD repository to two builder  
boxes, and running cvs -q upd on the ports tree at the same time.  
(the builder boxes also have em(4) interfaces, which I haven't  
patched, but they're running 7.0-CURRENT).  Everything is i386.


[EMAIL PROTECTED]:/dtbox] 739# vmstat -i
...
irq21: em0 acpi0  965426857
...

- -aDe

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.3 (Darwin)

iD8DBQFFKexJpXS8U0IvffwRArroAKCR69boUDor2t+L9rXsYXpoYsQkEQCeIcYg
pSAbtbu28DAUE+EbOJUmIk8=
=NbgC
-END PGP SIGNATURE-
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Patch available for shared em interrupts (Re: em, bge, network problems survey.)

2006-10-06 Thread Guy Brand
Kris Kennaway ([EMAIL PROTECTED]) on 05/10/2006 at 22:34 wrote:

 Based on successful testing on a machine with shared em interrupt, the
 following patch should work around the problem *in that case*.
[...]
 Please let Scott and I know whether or not this patch works for you
 (in addition to the information previously requested, if you have not
 already sent it).  Unfortunately it is only a workaround, but it
 points to an underlying problem with fast interrupt handlers on a
 shared irq that can be studied separately.

  # mojito uptime
  14:23  up  1:59, 4 users, load averages: 0,07 0,05 0,01
  # mojito uname -v
  FreeBSD 6.2-PRERELEASE #15: Fri Oct  6 12:11:36 CEST 2006
  [EMAIL PROTECTED]:/usr/obj/usr/src/sys/DEBUG 

  Your patch fixes my em/nvidia issue.
  Thanks Kris

-- 
  bug

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Patch available for shared em interrupts (Re: em, bge, network problems survey.)

2006-10-05 Thread Kris Kennaway
On Thu, Oct 05, 2006 at 04:05:52PM -0400, Kris Kennaway wrote:
 On Wed, Oct 04, 2006 at 05:14:27PM -0600, Scott Long wrote:
  All,
  
  I'm seeing some patterns here with all of the network driver problem 
  reports, but I need more information to help narrow it down further.
  I ask all of you who are having problems to take a minute to fill
  out this survey and return it to Kris Kennaway (on cc:) and myself.
  Thanks.
  
  1. Are you experiencing network hangs and/or timeout messages on the 
  console?  If yes, please provide a _brief_ description of the problem.
 
 OK, next question, to all em users:
 
 If your em device is using a shared interrupt, and you are NOT
 experiencing timeout problems when using this device, please let me
 know:

Based on successful testing on a machine with shared em interrupt, the
following patch should work around the problem *in that case*.

Note that this patch will not help you if you are not using the em
driver, or if you are seeing the problem with non-shared em interrupt
(I have investigated on such outlier, which seems to be a problem with
a particular model of em hardware and not a generic problem with the
driver).

Index: if_em.c
===
RCS file: /home/ncvs/src/sys/dev/em/if_em.c,v
retrieving revision 1.65.2.18
diff -u -u -r1.65.2.18 if_em.c
--- if_em.c 25 Aug 2006 12:38:26 -  1.65.2.18
+++ if_em.c 5 Oct 2006 22:05:45 -
@@ -2086,7 +2086,7 @@
taskqueue_start_threads(adapter-tq, 1, PI_NET, %s taskq,
device_get_nameunit(adapter-dev));
if ((error = bus_setup_intr(dev, adapter-res_interrupt,
-   INTR_TYPE_NET | INTR_FAST, em_intr_fast, adapter,
+   INTR_TYPE_NET | INTR_MPSAFE, em_intr_fast, adapter,
adapter-int_handler_tag)) != 0) {
device_printf(dev, Failed to register fast interrupt 
handler: %d\n, error);

Please let Scott and I know whether or not this patch works for you
(in addition to the information previously requested, if you have not
already sent it).  Unfortunately it is only a workaround, but it
points to an underlying problem with fast interrupt handlers on a
shared irq that can be studied separately.

Kris



pgpp54QFa2jMW.pgp
Description: PGP signature