Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]

2006-10-17 Thread Fredrik Widlund
Hi,

We have a Dell 1950 with the same problem (bce). We tried
debug.mpsafenet=0, but to no avail. It's a very frustrating show-stopper
for us as well, we're moving all 1950 out of the production environment.
Any help would be greatly appreciated.

See mail to freebsd-current mail attached.

Kind regards,
Fredrik Widlund

---BeginMessage---
Hi,

Suddenly the problem occured again. We are running the same setup as
below, but with debug.mpsafenet=0, but it didn't help. This is indeed a
showstopper for us, we are moving all our dell 1950 out of production
environment until we can solve this issue. Any help would be greatly
appreciated.

Kind regards,
Fredrik Widlund

bce0: /usr/src/sys/dev/bce/if_bce.c(5032): Watchdog timeout occurred,
resetting!
bce0: link state changed to DOWN
bce0: link state changed to UP
bce0: /usr/src/sys/dev/bce/if_bce.c(5032): Watchdog timeout occurred,
resetting!
bce0: link state changed to DOWN
bce0: link state changed to UP
[repeat 30 times]

# vmstat -i
interrupt  total   rate
irq14: ata0   47  0
irq16: bce0 bce13019  5
irq18: mfi0  123  0
irq21: uhci0 uhci+ 6  0
irq64: mpt0 1214  2
cpu0: timer  1118344   1997
Total1122753   2004

Fredrik Widlund wrote:
 Hi,

 I can't reproduce the problem. Everything is exactly the same, but I get
 no timeouts and the nic seems to work without any problems.

 Kind regards,
 Fredrik Widlund


 Fredrik Widlund wrote:
   
 Hi,

 An update, right now the BCE nic seems to work, I'm not sure exactly why
 yet. I'm attaching the dmesg however.

 SAS adapter is the PERC 5I, which is handled by the MPT driver in
 6.2-Beta2. I'll continue to look at this. There are some unhandled
 events (0x12, 0x16), but these might not be needed.

 [mpi_ioc.h]
 #define MPI_EVENT_SAS_PHY_LINK_STATUS   (0x0012)
 ...
 #define MPI_EVENT_SAS_DISCOVERY (0x0016)

 [dmesg mpt part]
 mpt0: LSILogic SAS/SATA Adapter port 0xec00-0xecff mem
 0xfc7fc000-0xfc7f,0xfc7e-0xfc7e irq 64 at device 8.0 on pci2
 mpt0: [GIANT-LOCKED]
 mpt0: MPI Version=1.5.12.0
 mpt0: mpt_cam_event: 0x16
 mpt0: Unhandled Event Notify Frame. Event 0x16 (ACK not required).
 mpt0: mpt_cam_event: 0x12
 mpt0: Unhandled Event Notify Frame. Event 0x12 (ACK not required).
 mpt0: mpt_cam_event: 0x16
 mpt0: Unhandled Event Notify Frame. Event 0x16 (ACK not required).

 Kind regards,
 Fredrik Widlund

 Fredrik Widlund wrote:
   
 
 Hi,

 I'm trying to get FreeBSD working on Dell 1950 (and 2950), which is
 vital since it's no longer possible to buy 1850/2850 units here.

 Hardware:
 PE1950 Xeon 5130, 2GB 667MHz
 SAS 5I
 PERC5E

 6.1-RELEASE: not possible since SAS drives aren't found.
 6.2-BETA2: bce interfaces does not work at all, watchdog timeout
 occured every other second, and _no_ connectivity.

 We are also having problems with some PE1850 failing from time to time
 with watchdog timeout hangs, and have had to debug.mpsafenet=0 these.

 How can we help solve this issue? It would really be a pity to be
 forced to leave FreeBSD but we really can't afford to replace our
 choice of hardware platform.

 Kind regards,
 Fredrik Widlund





 ___
 freebsd-current@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-current
 To unsubscribe, send any mail to
 [EMAIL PROTECTED]
 
   
   
 

 Copyright (c) 1992-2006 The FreeBSD Project.
 Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
 The Regents of the University of California. All rights reserved.
 FreeBSD is a registered trademark of The FreeBSD Foundation.
 FreeBSD 6.2-BETA2 #0: Mon Oct  2 03:32:44 UTC 2006
 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/SMP
 Timecounter i8254 frequency 1193182 Hz quality 0
 CPU: Intel(R) Xeon(R) CPU5130  @ 2.00GHz (1995.01-MHz 686-class 
 CPU)
   Origin = GenuineIntel  Id = 0x6f6  Stepping = 6
   
 Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,
 HTT,TM,PBE
   
 Features2=0x4e33dSSE3,RSVD2,MON,DS_CPL,VMX,TM2,b9,CX16,b14,b15,b18
   AMD Features=0x2010NX,LM
   AMD Features2=0x1LAHF
   Cores per package: 2
 real memory  = 2147123200 (2047 MB)
 avail memory = 2096009216 (1998 MB)
 ACPI APIC Table: DELL   PE_SC3  
 FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
  cpu0 (BSP): APIC ID:  0
  cpu1 (AP): APIC ID:  1
 ioapic0: Changing APIC ID to 2
 ioapic1: Changing APIC ID to 3
 ioapic1: WARNING: intbase 64 != expected base 24
 ioapic0 Version 2.0 irqs 0-23 on motherboard
 ioapic1 Version 2.0 irqs 64-87 on motherboard
 kbd1 at kbdmux0
 ath_hal: 0.9.17.2 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, 

Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]

2006-10-05 Thread Guy Brand
Scott Long ([EMAIL PROTECTED]) on 04/10/2006 at 14:49 wrote:

 #*default release=cvs tag=RELENG_6 date=2006.08.08.09.12.56
 # OK
 #
 #*default release=cvs tag=RELENG_6 date=2006.08.08.09.21.00
 # BROKEN
 ...
 
 #*default release=cvs tag=RELENG_6
 # BROKEN
 
   From sys commitlogs the culprit commits are:
 
   glebius 2006-08-08 09:19:25 utc
   glebius 2006-08-08 09:20:26 utc

 So you tested before these two changes and after these two changes, yes?

  Yes that's it.

 What about with just the first change and not the second?  Anyways, I'm 

  Because building a kernel that only has the first change (2006-08-08
  09:19:25) fails.

 Can you try a quick test?  Reboot and press '6' at the FreeBSD loader
 menu.  That will drop you to a prompt.  Then enter the following line:
 
 set hint.apic.0.disabled=1

  Done: synced to STABLE-6 of this morning (9:00 UTC)i, made world and
  kernel and boot with APIC disabled. Still same freeze after starting
  X and loading a few tabs in Firefox.

  Thanks for the suggestion Scott.

-- 
  bug

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]

2006-10-04 Thread Guy Brand
Craig Boston ([EMAIL PROTECTED]) on 29/09/2006 at 20:19 wrote:

 One thing this patch definitely did do though, is break the nvidia
 driver pretty badly.  Couldn't keep the X server running for more than a
 minute before it froze solid.  Lots of Xid: blah blah blah messages.
 Yes I remembered to rebuild the kernel module ;)

  Hi,


  Since rebuilding to 6.2-PRERELEASE FreeBSD 6.2-PRERELEASE #1: Mon
  Oct  2 15:24:04 CEST 2006 DEBUG  i386 on a box having em sharing
  IRQ with nvidia (NVIDIA-FreeBSD-x86-1.0-8756):

  interrupt  total   rate
  irq1: atkbd0   5  0
  irq14: ata0   47  0
  irq16: nvidia0 em+ 86545185
  irq17: fwohci0 7  0
  irq21: twe0 6426 13
  cpu0: timer   927735   1986
  Total1020765   2185

  I freeze the box by starting firefox which reloads a few tabs I keep
  open in my session when under X. This is perfectly reproductible.
  From the logs, first I see:

Oct  2 16:47:39 mojito kernel: NVRM: Xid (0001:00): 16, Head  Count 
00010597
Oct  2 16:47:43 mojito kernel: NVRM: Xid (0001:00): 8, Channel 
Oct  2 16:47:47 mojito kernel: NVRM: Xid (0001:00): 16, Head  Count 
00010598
Oct  2 16:47:55 mojito kernel: NVRM: Xid (0001:00): 16, Head  Count 
00010599
Oct  2 16:48:03 mojito kernel: NVRM: Xid (0001:00): 16, Head  Count 
0001059a
Oct  2 16:48:11 mojito kernel: NVRM: Xid (0001:00): 16, Head  Count 
0001059b
Oct  2 16:48:19 mojito kernel: NVRM: Xid (0001:00): 16, Head  Count 
0001059c
Oct  2 16:48:27 mojito kernel: NVRM: Xid (0001:00): 16, Head  Count 
0001059d
Oct  2 16:48:35 mojito kernel: NVRM: Xid (0001:00): 16, Head  Count 
0001059e
Oct  2 16:48:43 mojito kernel: NVRM: Xid (0001:00): 16, Head  Count 
0001059f
Oct  2 16:48:52 mojito kernel: NVRM: Xid (0001:00): 16, Head  Count 
000105a0

  then come the watchdogs:

Oct  2 16:48:56 mojito kernel: em0: watchdog timeout -- resetting
Oct  2 16:48:56 mojito kernel: em0: link state changed to DOWN
Oct  2 16:48:58 mojito kernel: em0: link state changed to UP
Oct  2 16:49:00 mojito kernel: NVRM: Xid (0001:00): 16, Head  Count 
000105a1
Oct  2 16:49:06 mojito kernel: em0: watchdog timeout -- resetting
Oct  2 16:49:06 mojito kernel: em0: link state changed to DOWN
Oct  2 16:49:08 mojito kernel: NVRM: Xid (0001:00): 16, Head  Count 
000105a2
Oct  2 16:49:08 mojito kernel: em0: link state changed to UP
Oct  2 16:49:16 mojito kernel: NVRM: Xid (0001:00): 16, Head  Count 
000105a3
Oct  2 16:49:16 mojito kernel: em0: watchdog timeout -- resetting
Oct  2 16:49:16 mojito kernel: em0: link state changed to DOWN
Oct  2 16:49:18 mojito kernel: em0: link state changed to UP
Oct  2 16:49:24 mojito kernel: NVRM: Xid (0001:00): 16, Head  Count 
000105a4
Oct  2 16:49:26 mojito kernel: em0: watchdog timeout -- resetting
Oct  2 16:49:26 mojito kernel: em0: link state changed to DOWN
Oct  2 16:49:29 mojito kernel: em0: link state changed to UP
Oct  2 16:49:32 mojito kernel: NVRM: Xid (0001:00): 16, Head  Count 
000105a5
Oct  2 16:49:36 mojito kernel: em0: watchdog timeout -- resetting
Oct  2 16:49:36 mojito kernel: em0: link state changed to DOWN
Oct  2 16:49:39 mojito kernel: em0: link state changed to UP
Oct  2 16:49:47 mojito kernel: em0: watchdog timeout -- resetting
Oct  2 16:49:47 mojito kernel: em0: link state changed to DOWN
Oct  2 16:49:49 mojito kernel: em0: link state changed to UP

  and the box ends up frozen less than a minute later. The traffic
  on the Intel card can be low (pinging a host for a few dozen of
  seconds), medium (reloading a few pages in the tabs of Firefox) or
  high (downloading several iso images from our local FTP mirror):
  whatever I do, if both nvidia and em0 are used, the box freezes.

  Note that I can't freeze the box when doing several simultaneous big
  downloads or taring up a lot of files but NOT running X. So I guess
  it is a shared nvidia/em IRQ issue.

  FreeBSD 6.1-STABLE #0: Fri Jun 23 17:00:43 CEST 2006 had no such problem.
  The DEBUG kernconf is GENERIC + witness options enabled (but they
  do not help in this case).

  I traced back to find which changeset introduced the trouble. The
  results are:

#*default release=cvs tag=RELENG_6 date=2006.06.23.17.00.00
# OK
...

#*default release=cvs tag=RELENG_6 date=2006.08.08.09.12.56
# OK
#
#*default release=cvs tag=RELENG_6 date=2006.08.08.09.21.00
# BROKEN
...

#*default release=cvs tag=RELENG_6
# BROKEN

  From sys commitlogs the culprit commits are:

  glebius 2006-08-08 09:19:25 utc
  freebsd src repository

  modified files:

Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]

2006-10-04 Thread Bill Moran
In response to Scott Long [EMAIL PROTECTED]:

 Corrected patch is at:
 
 http://people.freebsd.org/~scottl/usb_fastintr_RELENG_6.diff

I have a Dell 1950 here that's been dedicated to helping solve this
problem.  I can reliably reproduce the watchdog timeout by doing
the following steps:

1) Mount /usr/src via nfs
2) start a -j99 buildworld
3) On a different terminal, do tar czvf /usr/src/temp.tgz /big/directory

Usually only takes a few minutes before a watchdog occurs, and I have
no more networking.

Your patch applied cleanly, and everything built OK.  The results are:
a) My USB keyboard stopped working :(
b) The problem does _not_ improve.

In my case, it's a bce driver that's doing it.  I also have some em
cards in this machine that I can test if the information will be
helpful.

This is quite a show-stopper for us, if there's any other testing/etc
I can do, _please_ let me know.  I might even be able to get remote
console access to this machine approved for a developer.

-- 
Bill Moran
Collaborative Fusion Inc.


IMPORTANT: This message contains confidential information and is
intended only for the individual named. If the reader of this
message is not an intended recipient (or the individual
responsible for the delivery of this message to an intended
recipient), please be advised that any re-use, dissemination,
distribution or copying of this message is prohibited. Please
notify the sender immediately by e-mail if you have received
this e-mail by mistake and delete this e-mail from your system.
E-mail transmission cannot be guaranteed to be secure or
error-free as information could be intercepted, corrupted, lost,
destroyed, arrive late or incomplete, or contain viruses. The
sender therefore does not accept liability for any errors or
omissions in the contents of this message, which arise as a
result of e-mail transmission.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]

2006-10-04 Thread Bill Moran
In response to Bill Moran [EMAIL PROTECTED]:

 In my case, it's a bce driver that's doing it.  I also have some em
 cards in this machine that I can test if the information will be
 helpful.

Note that I can _not_ reproduce the problem with an em interface (a
PCI NIC).  As mentioned earlier, I can reliably and easily produce
a watchdog timeout on the bce interface (onboard).  The em interface
seems rock-solid.

I guess I have a workaround for now, but the offer to test/provide
more information stands.

-- 
Bill Moran
Collaborative Fusion Inc.


IMPORTANT: This message contains confidential information and is
intended only for the individual named. If the reader of this
message is not an intended recipient (or the individual
responsible for the delivery of this message to an intended
recipient), please be advised that any re-use, dissemination,
distribution or copying of this message is prohibited. Please
notify the sender immediately by e-mail if you have received
this e-mail by mistake and delete this e-mail from your system.
E-mail transmission cannot be guaranteed to be secure or
error-free as information could be intercepted, corrupted, lost,
destroyed, arrive late or incomplete, or contain viruses. The
sender therefore does not accept liability for any errors or
omissions in the contents of this message, which arise as a
result of e-mail transmission.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]

2006-10-04 Thread Kris Kennaway
On Wed, Oct 04, 2006 at 10:40:25AM -0400, Bill Moran wrote:
 In response to Scott Long [EMAIL PROTECTED]:
 
  Corrected patch is at:
  
  http://people.freebsd.org/~scottl/usb_fastintr_RELENG_6.diff
 
 I have a Dell 1950 here that's been dedicated to helping solve this
 problem.  I can reliably reproduce the watchdog timeout by doing
 the following steps:
 
 1) Mount /usr/src via nfs
 2) start a -j99 buildworld
 3) On a different terminal, do tar czvf /usr/src/temp.tgz /big/directory
 
 Usually only takes a few minutes before a watchdog occurs, and I have
 no more networking.
 
 Your patch applied cleanly, and everything built OK.  The results are:
 a) My USB keyboard stopped working :(
 b) The problem does _not_ improve.
 
 In my case, it's a bce driver that's doing it.  I also have some em
 cards in this machine that I can test if the information will be
 helpful.
 
 This is quite a show-stopper for us, if there's any other testing/etc
 I can do, _please_ let me know.  I might even be able to get remote
 console access to this machine approved for a developer.

Remote console access would be a help.  I suspect there may be more
than one problem here.

Kris


pgpu6t2nkM1Ej.pgp
Description: PGP signature


Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]

2006-10-04 Thread Mike Tancsa

At 12:27 PM 10/4/2006, Bill Moran wrote:

In response to Bill Moran [EMAIL PROTECTED]:

 In my case, it's a bce driver that's doing it.  I also have some em
 cards in this machine that I can test if the information will be
 helpful.

Note that I can _not_ reproduce the problem with an em interface (a
PCI NIC).  As mentioned earlier, I can reliably and easily produce


Hi, Just to clarify, you mean without the patch you do run into the 
problem, but with the patch you cannot generate the problem ? Or with 
the em NIC, you have never seen the issue at all ?


---Mike 


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]

2006-10-04 Thread Bill Moran
In response to Mike Tancsa [EMAIL PROTECTED]:

 At 12:27 PM 10/4/2006, Bill Moran wrote:
 In response to Bill Moran [EMAIL PROTECTED]:
 
   In my case, it's a bce driver that's doing it.  I also have some em
   cards in this machine that I can test if the information will be
   helpful.
 
 Note that I can _not_ reproduce the problem with an em interface (a
 PCI NIC).  As mentioned earlier, I can reliably and easily produce
 
 Hi, Just to clarify, you mean without the patch you do run into the 
 problem, but with the patch you cannot generate the problem ? Or with 
 the em NIC, you have never seen the issue at all ?

Without patch:
* bce locks up easily
* Unable to lock up em
* keyboard works
With patch:
* bce locks up easily
* unable to lock up em
* keyboard doesn't work

-- 
Bill Moran
Collaborative Fusion Inc.


IMPORTANT: This message contains confidential information and is
intended only for the individual named. If the reader of this
message is not an intended recipient (or the individual
responsible for the delivery of this message to an intended
recipient), please be advised that any re-use, dissemination,
distribution or copying of this message is prohibited. Please
notify the sender immediately by e-mail if you have received
this e-mail by mistake and delete this e-mail from your system.
E-mail transmission cannot be guaranteed to be secure or
error-free as information could be intercepted, corrupted, lost,
destroyed, arrive late or incomplete, or contain viruses. The
sender therefore does not accept liability for any errors or
omissions in the contents of this message, which arise as a
result of e-mail transmission.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]

2006-10-04 Thread Bill Moran
In response to Kris Kennaway [EMAIL PROTECTED]:

  This is quite a show-stopper for us, if there's any other testing/etc
  I can do, _please_ let me know.  I might even be able to get remote
  console access to this machine approved for a developer.
 
 Remote console access would be a help.  I suspect there may be more
 than one problem here.

In progress ... I'll contact you privately when it's ready.

-- 
Bill Moran
Collaborative Fusion Inc.


IMPORTANT: This message contains confidential information and is
intended only for the individual named. If the reader of this
message is not an intended recipient (or the individual
responsible for the delivery of this message to an intended
recipient), please be advised that any re-use, dissemination,
distribution or copying of this message is prohibited. Please
notify the sender immediately by e-mail if you have received
this e-mail by mistake and delete this e-mail from your system.
E-mail transmission cannot be guaranteed to be secure or
error-free as information could be intercepted, corrupted, lost,
destroyed, arrive late or incomplete, or contain viruses. The
sender therefore does not accept liability for any errors or
omissions in the contents of this message, which arise as a
result of e-mail transmission.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]

2006-10-04 Thread Scott Long

Guy Brand wrote:

Craig Boston ([EMAIL PROTECTED]) on 29/09/2006 at 20:19 wrote:



One thing this patch definitely did do though, is break the nvidia
driver pretty badly.  Couldn't keep the X server running for more than a
minute before it froze solid.  Lots of Xid: blah blah blah messages.
Yes I remembered to rebuild the kernel module ;)



  Hi,


  Since rebuilding to 6.2-PRERELEASE FreeBSD 6.2-PRERELEASE #1: Mon
  Oct  2 15:24:04 CEST 2006 DEBUG  i386 on a box having em sharing
  IRQ with nvidia (NVIDIA-FreeBSD-x86-1.0-8756):

  interrupt  total   rate
  irq1: atkbd0   5  0
  irq14: ata0   47  0
  irq16: nvidia0 em+ 86545185
  irq17: fwohci0 7  0
  irq21: twe0 6426 13
  cpu0: timer   927735   1986
  Total1020765   2185

  I freeze the box by starting firefox which reloads a few tabs I keep
  open in my session when under X. This is perfectly reproductible.
  From the logs, first I see:

Oct  2 16:47:39 mojito kernel: NVRM: Xid (0001:00): 16, Head  Count 
00010597
Oct  2 16:47:43 mojito kernel: NVRM: Xid (0001:00): 8, Channel 
Oct  2 16:47:47 mojito kernel: NVRM: Xid (0001:00): 16, Head  Count 
00010598
Oct  2 16:47:55 mojito kernel: NVRM: Xid (0001:00): 16, Head  Count 
00010599
Oct  2 16:48:03 mojito kernel: NVRM: Xid (0001:00): 16, Head  Count 
0001059a
Oct  2 16:48:11 mojito kernel: NVRM: Xid (0001:00): 16, Head  Count 
0001059b
Oct  2 16:48:19 mojito kernel: NVRM: Xid (0001:00): 16, Head  Count 
0001059c
Oct  2 16:48:27 mojito kernel: NVRM: Xid (0001:00): 16, Head  Count 
0001059d
Oct  2 16:48:35 mojito kernel: NVRM: Xid (0001:00): 16, Head  Count 
0001059e
Oct  2 16:48:43 mojito kernel: NVRM: Xid (0001:00): 16, Head  Count 
0001059f
Oct  2 16:48:52 mojito kernel: NVRM: Xid (0001:00): 16, Head  Count 
000105a0

  then come the watchdogs:

Oct  2 16:48:56 mojito kernel: em0: watchdog timeout -- resetting
Oct  2 16:48:56 mojito kernel: em0: link state changed to DOWN
Oct  2 16:48:58 mojito kernel: em0: link state changed to UP
Oct  2 16:49:00 mojito kernel: NVRM: Xid (0001:00): 16, Head  Count 
000105a1
Oct  2 16:49:06 mojito kernel: em0: watchdog timeout -- resetting
Oct  2 16:49:06 mojito kernel: em0: link state changed to DOWN
Oct  2 16:49:08 mojito kernel: NVRM: Xid (0001:00): 16, Head  Count 
000105a2
Oct  2 16:49:08 mojito kernel: em0: link state changed to UP
Oct  2 16:49:16 mojito kernel: NVRM: Xid (0001:00): 16, Head  Count 
000105a3
Oct  2 16:49:16 mojito kernel: em0: watchdog timeout -- resetting
Oct  2 16:49:16 mojito kernel: em0: link state changed to DOWN
Oct  2 16:49:18 mojito kernel: em0: link state changed to UP
Oct  2 16:49:24 mojito kernel: NVRM: Xid (0001:00): 16, Head  Count 
000105a4
Oct  2 16:49:26 mojito kernel: em0: watchdog timeout -- resetting
Oct  2 16:49:26 mojito kernel: em0: link state changed to DOWN
Oct  2 16:49:29 mojito kernel: em0: link state changed to UP
Oct  2 16:49:32 mojito kernel: NVRM: Xid (0001:00): 16, Head  Count 
000105a5
Oct  2 16:49:36 mojito kernel: em0: watchdog timeout -- resetting
Oct  2 16:49:36 mojito kernel: em0: link state changed to DOWN
Oct  2 16:49:39 mojito kernel: em0: link state changed to UP
Oct  2 16:49:47 mojito kernel: em0: watchdog timeout -- resetting
Oct  2 16:49:47 mojito kernel: em0: link state changed to DOWN
Oct  2 16:49:49 mojito kernel: em0: link state changed to UP

  and the box ends up frozen less than a minute later. The traffic
  on the Intel card can be low (pinging a host for a few dozen of
  seconds), medium (reloading a few pages in the tabs of Firefox) or
  high (downloading several iso images from our local FTP mirror):
  whatever I do, if both nvidia and em0 are used, the box freezes.

  Note that I can't freeze the box when doing several simultaneous big
  downloads or taring up a lot of files but NOT running X. So I guess
  it is a shared nvidia/em IRQ issue.

  FreeBSD 6.1-STABLE #0: Fri Jun 23 17:00:43 CEST 2006 had no such problem.
  The DEBUG kernconf is GENERIC + witness options enabled (but they
  do not help in this case).

  I traced back to find which changeset introduced the trouble. The
  results are:

#*default release=cvs tag=RELENG_6 date=2006.06.23.17.00.00
# OK
...

#*default release=cvs tag=RELENG_6 date=2006.08.08.09.12.56
# OK
#
#*default release=cvs tag=RELENG_6 date=2006.08.08.09.21.00
# BROKEN
...

#*default release=cvs tag=RELENG_6
# BROKEN

  From sys commitlogs the culprit commits are:

  glebius 2006-08-08 09:19:25 utc
  freebsd src repository

  

Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]

2006-10-04 Thread Martin Blapp


Hi,

What about with just the first change and not the second?  Anyways, I'm 
starting to see a trend here.  Problem reports are clustering around UP

systems, not SMP systems.  I don't know if that's just coincidence or not.


We've got also about twenty SMP Systems, seven of them now with 6.1 Prerelease 
and we don't have any affected systems. bge- and em- cards are working fine, 
even under high load situations.


Martin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]

2006-10-04 Thread Jorge Aldana
I also have been using em (on-board NIC) with SMP without any problems, I just 
upgraded to check and all is still fine:


New kernel : FreeBSD 6.2-PRERELEASE #7: Mon Oct  2 15:15:47 PDT 2006
Old kernel : FreeBSD 6.1-STABLE #4: Wed Sep  6 16:01:23 PDT 2006

I also have nvidia and use firefox with pre-saved tabs (~30), all works fine 
even on re-loading.


Let me know if you would like any other info.

Jorge

On Thu, 5 Oct 2006, Martin Blapp wrote:



Hi,

We've got also about twenty SMP Systems, seven of them now with 6.1 
Prerelease and we don't have any affected systems. bge- and em- cards are 
working fine, even under high load situations.


Martin

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]

2006-10-04 Thread Brian

Martin Blapp wrote:


Hi,

What about with just the first change and not the second?  Anyways, 
I'm starting to see a trend here.  Problem reports are clustering 
around UP
systems, not SMP systems.  I don't know if that's just coincidence or 
not.


We've got also about twenty SMP Systems, seven of them now with 6.1 
Prerelease and we don't have any affected systems. bge- and em- cards 
are working fine, even under high load situations.


Martin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]
I remember having this problem a few years ago on an openbsd box with 2 
nics.  At that time, I found a mailing list post outlining a process 
where you'd enter a break sequence to get  to a command prompt before 
booting and enter some command there , I believe to disable acpi, and 
that would help.  its been like 3-4 years so i dont remember the details.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]

2006-09-30 Thread Paul Allen
From Kris Kennaway [EMAIL PROTECTED], Fri, Sep 29, 2006 at 09:42:42PM -0400:
 On Fri, Sep 29, 2006 at 08:34:39PM -0500, Craig Boston wrote:
  On Fri, Sep 29, 2006 at 08:19:04PM -0500, Craig Boston wrote:
   On Thu, Sep 28, 2006 at 01:48:42PM -0600, Scott Long wrote:
http://people.freebsd.org/~scottl/usb_fastintr_RELENG_6.diff
   
   At first glance it appeared to work, but I'm about to do some more
   testing since I just discovered that I have to kldload something
   (anything) first in order to reproduce the problem.  Weird.
  
  I can confirm that despite the other side effect I already mentioned,
  this patch does fix or at least mask the problem I'm seing with em (and
  probably usb).
 
 Which is odd since the hypothesis Scott was working on should have
 shown up clearly in the mutex trace, but did not.

But it is consistent with there being a beat-frequency problem with 
respect to the scheduler.  I think the number you really need is not
how long giant was held but how long was spent waiting for it.


Paul

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]

2006-09-30 Thread Scott Long

Craig Boston wrote:


On Thu, Sep 28, 2006 at 01:48:42PM -0600, Scott Long wrote:


http://people.freebsd.org/~scottl/usb_fastintr_RELENG_6.diff



At first glance it appeared to work, but I'm about to do some more
testing since I just discovered that I have to kldload something
(anything) first in order to reproduce the problem.  Weird.

One thing this patch definitely did do though, is break the nvidia
driver pretty badly.  Couldn't keep the X server running for more than a
minute before it froze solid.  Lots of Xid: blah blah blah messages.
Yes I remembered to rebuild the kernel module ;)

Oh, and if anyone is curious, I am able to reproduce the problem after
booting without nvidia.ko loaded, using qemu in -nographic mode.  Just
wanted to rule that out since its code that's out of our control and
would be a prime target to blame if I didn't.

Craig


My patch shouldn't have a single effect on nvidia.  It just gets the USB 
out of the way of other drivers.  Weird.  But what does 'blah blah' 
translate into?


Scott

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]

2006-09-30 Thread Kris Kennaway
On Fri, Sep 29, 2006 at 11:05:35PM -0700, Paul Allen wrote:
 From Kris Kennaway [EMAIL PROTECTED], Fri, Sep 29, 2006 at 09:42:42PM 
 -0400:
  On Fri, Sep 29, 2006 at 08:34:39PM -0500, Craig Boston wrote:
   On Fri, Sep 29, 2006 at 08:19:04PM -0500, Craig Boston wrote:
On Thu, Sep 28, 2006 at 01:48:42PM -0600, Scott Long wrote:
 http://people.freebsd.org/~scottl/usb_fastintr_RELENG_6.diff

At first glance it appeared to work, but I'm about to do some more
testing since I just discovered that I have to kldload something
(anything) first in order to reproduce the problem.  Weird.
   
   I can confirm that despite the other side effect I already mentioned,
   this patch does fix or at least mask the problem I'm seing with em (and
   probably usb).
  
  Which is odd since the hypothesis Scott was working on should have
  shown up clearly in the mutex trace, but did not.
 
 But it is consistent with there being a beat-frequency problem with 
 respect to the scheduler.  I think the number you really need is not
 how long giant was held but how long was spent waiting for it.

It also seemed to show that nothing was really waiting for it (the
cnt_* entries).

Kris


pgpfcLLwtdeCE.pgp
Description: PGP signature


Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]

2006-09-30 Thread Craig Boston
On Sat, Sep 30, 2006 at 12:14:17AM -0600, Scott Long wrote:
 One thing this patch definitely did do though, is break the nvidia
 driver pretty badly.  Couldn't keep the X server running for more than a
 minute before it froze solid.  Lots of Xid: blah blah blah messages.
 Yes I remembered to rebuild the kernel module ;)
 
 My patch shouldn't have a single effect on nvidia.  It just gets the USB 
 out of the way of other drivers.  Weird.  But what does 'blah blah' 
 translate into?

It didn't make any sense to me either after looking at the patch...  I'm
100% sure that was the only change between boots, and it started working
again after I reverted the sys/dev/usb directory and rebuilt. (svk is
great for juggling patch sets around)

That's one of the reasons I briefly suspected the nvidia driver causing
problems somewhere, so I removed that from the mix just to be sure.

'blah blah' translates into numbers that mean nothing to me, but they
may be useful to someone:

Sep 29 16:57:09 kernel: NVRM: Xid (0001:00): 16, Head  Count 0ae5
Sep 29 16:57:09 kernel: NVRM: Xid (0001:00): 16, Head 0001 Count 0ae4
Sep 29 16:57:11 kernel: NVRM: Xid (0001:00): 8, Channel 
Sep 29 16:57:17 kernel: NVRM: Xid (0001:00): 16, Head  Count 0ae6
Sep 29 16:57:17 kernel: NVRM: Xid (0001:00): 16, Head 0001 Count 0ae5
Sep 29 16:57:19 kernel: NVRM: Xid (0001:00): 8, Channel 001e
Sep 29 16:57:25 kernel: NVRM: Xid (0001:00): 16, Head  Count 0ae7
Sep 29 16:57:25 kernel: NVRM: Xid (0001:00): 16, Head 0001 Count 0ae6
Sep 29 16:57:27 kernel: NVRM: Xid (0001:00): 8, Channel 001e
Sep 29 16:57:33 kernel: NVRM: Xid (0001:00): 16, Head  Count 0ae8
Sep 29 16:57:33 kernel: NVRM: Xid (0001:00): 16, Head 0001 Count 0ae7
Sep 29 16:57:35 kernel: NVRM: Xid (0001:00): 8, Channel 001e
Sep 29 16:57:41 kernel: NVRM: Xid (0001:00): 16, Head  Count 0ae9
Sep 29 16:57:41 kernel: NVRM: Xid (0001:00): 16, Head 0001 Count 0ae8
Sep 29 16:57:43 kernel: NVRM: Xid (0001:00): 8, Channel 001e
Sep 29 16:57:49 kernel: NVRM: Xid (0001:00): 16, Head  Count 0aea
Sep 29 16:58:19 kernel: NVRM: Xid (0001:00): 8, Channel 
Sep 29 16:58:27 kernel: NVRM: Xid (0001:00): 8, Channel 001e
Sep 29 16:58:51 last message repeated 3 times
Sep 29 16:58:51 kernel: NVRM: Xid (0001:00): 7, Ch 0001 M D 
bfef0007 intr 0001

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]

2006-09-30 Thread Craig Boston
On Sat, Sep 30, 2006 at 02:39:06PM -0400, Kris Kennaway wrote:
   Which is odd since the hypothesis Scott was working on should have
   shown up clearly in the mutex trace, but did not.
  
  But it is consistent with there being a beat-frequency problem with 
  respect to the scheduler.  I think the number you really need is not
  how long giant was held but how long was spent waiting for it.
 
 It also seemed to show that nothing was really waiting for it (the
 cnt_* entries).

I can set up a serial console an poke around in DDB during my test case
if anyone thinks some useful information can be found.

Unfortunately I'm remote from the machine right now so I won't be able
to do that until Monday :/

Craig
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]

2006-09-29 Thread Craig Boston
I've been experiencing this problem too, along with my USB keyboard
acting 'wonky' (stuttering from time to time).  For me at least it seems
to be tied to CPU usage, meaning it's probably related to the taskqueue
or maybe even the scheduler.  I can also reproduce the problem on a much
bigger scale than I've seen mentioned anywhere else (up to 30 seconds!).

One sure-fire way to trigger it for me is to boot the Ubuntu 6.06.1 CD
inside of qemu.  I don't have kqemu or anything loaded -- it can be
provoked by an ordinary process running as an ordinary user.

While it's sitting at the GRUB screen (30 second countdown), my USB keyboard
becomes inoperable, and em0 goes totally dead.  It feels like no interrupts
getting through -- if a key was pressed it will repeat until the 30 seconds are
up or I kill the process.

I initially suspected something holding GIANT for a long time, so I
tried the giantless USB patches but that didn't help.

Interestingly, I have another em interface in this machine but it
continues to work.

em0 is sharing irq19 with uhci1 (which the keyboard is attached to).
em1 is on irq18.  So whatever it is somehow stops irq19 from getting
through, but the other IRQ lines seem unaffected.  Sounding more and
more like an APIC problem to me.  Or possibly the ithread getting stuck.

This machine *DID* work fine until sometime between 6.1 release and now.
Unfortunately I can't seem to reproduce the problem on any of my test
machines, only on the one that I need for day to day work :)

I'm about halfway through reading the thread, but will be happy to test
any patches do whatever I can to help.

Craig

[EMAIL PROTECTED]:10:0:  class=0x02 card=0x002e8086 chip=0x100e8086 
rev=0x02 hdr=0x00
vendor   = 'Intel Corporation'
device   = '82540EM Gigabit Ethernet Controller'
class= network
subclass = ethernet
[EMAIL PROTECTED]:12:0:  class=0x02 card=0x002e1028 chip=0x100e8086 
rev=0x02 hdr=0x00
vendor   = 'Intel Corporation'
device   = '82540EM Gigabit Ethernet Controller'
class= network
subclass = ethernet

On Thu, Sep 28, 2006 at 08:13:51AM -0600, Scott Long wrote:
 All,
 
 Attached is my first cut at addressing the problems described in this 
 thread.  As I discussed earlier, the VM syncer thread is likely starving
 the USB interrupt thread.  This causes the shared usb+network interrupt 
 to remain masked, preventing network interrupts from being delivered,
 and thus triggering watchdog timeouts.
 
 This patch only addresses the USB driver.  If your network card is 
 sharing an interrupt with something other than (or additional to) USB,
 this might not work for you.  Also, this patch is just a very rough
 proof-of-concept and is not meant for production use.  But I'd like to
 get feedback now before I spend more time on this.  If this works then
 I'll clean it up and make it suitable for the release, and I'll look at
 some of the other drivers like ichsmb.
 
 If this is to be fixed for 6.2, I need lots of feedback ASAP.  So please
 do not be shy =-)  The patch is at:
 
 http://people.freebsd.org/~scottl/usb_fastintr.diff
 
 Scott
 
 ___
 freebsd-stable@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-stable
 To unsubscribe, send any mail to [EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]

2006-09-29 Thread Craig Boston
On Thu, Sep 28, 2006 at 01:48:42PM -0600, Scott Long wrote:
 http://people.freebsd.org/~scottl/usb_fastintr_RELENG_6.diff

At first glance it appeared to work, but I'm about to do some more
testing since I just discovered that I have to kldload something
(anything) first in order to reproduce the problem.  Weird.

One thing this patch definitely did do though, is break the nvidia
driver pretty badly.  Couldn't keep the X server running for more than a
minute before it froze solid.  Lots of Xid: blah blah blah messages.
Yes I remembered to rebuild the kernel module ;)

Oh, and if anyone is curious, I am able to reproduce the problem after
booting without nvidia.ko loaded, using qemu in -nographic mode.  Just
wanted to rule that out since its code that's out of our control and
would be a prime target to blame if I didn't.

Craig
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]

2006-09-29 Thread Craig Boston
On Fri, Sep 29, 2006 at 08:19:04PM -0500, Craig Boston wrote:
 On Thu, Sep 28, 2006 at 01:48:42PM -0600, Scott Long wrote:
  http://people.freebsd.org/~scottl/usb_fastintr_RELENG_6.diff
 
 At first glance it appeared to work, but I'm about to do some more
 testing since I just discovered that I have to kldload something
 (anything) first in order to reproduce the problem.  Weird.

I can confirm that despite the other side effect I already mentioned,
this patch does fix or at least mask the problem I'm seing with em (and
probably usb).

Craig
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]

2006-09-29 Thread Kris Kennaway
On Fri, Sep 29, 2006 at 08:34:39PM -0500, Craig Boston wrote:
 On Fri, Sep 29, 2006 at 08:19:04PM -0500, Craig Boston wrote:
  On Thu, Sep 28, 2006 at 01:48:42PM -0600, Scott Long wrote:
   http://people.freebsd.org/~scottl/usb_fastintr_RELENG_6.diff
  
  At first glance it appeared to work, but I'm about to do some more
  testing since I just discovered that I have to kldload something
  (anything) first in order to reproduce the problem.  Weird.
 
 I can confirm that despite the other side effect I already mentioned,
 this patch does fix or at least mask the problem I'm seing with em (and
 probably usb).

Which is odd since the hypothesis Scott was working on should have
shown up clearly in the mutex trace, but did not.

Kris


pgp9YFBjiq1LX.pgp
Description: PGP signature


Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]

2006-09-28 Thread O. Hartmann

Scott Long wrote:

All,

Attached is my first cut at addressing the problems described in this 
thread.  As I discussed earlier, the VM syncer thread is likely starving
the USB interrupt thread.  This causes the shared usb+network 
interrupt to remain masked, preventing network interrupts from being 
delivered,

and thus triggering watchdog timeouts.

This patch only addresses the USB driver.  If your network card is 
sharing an interrupt with something other than (or additional to) USB,

this might not work for you.  Also, this patch is just a very rough
proof-of-concept and is not meant for production use.  But I'd like to
get feedback now before I spend more time on this.  If this works then
I'll clean it up and make it suitable for the release, and I'll look at
some of the other drivers like ichsmb.

If this is to be fixed for 6.2, I need lots of feedback ASAP.  So please
do not be shy =-)  The patch is at:

http://people.freebsd.org/~scottl/usb_fastintr.diff

Scott

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

patch does not work on my box:

cc -c -O2 -frename-registers -pipe -fno-strict-aliasing -march=athlon64 
-Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes  
-Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual  
-fformat-extensions -std=c99  -nostdinc -I-  -I. -I/usr/src/sys 
-I/usr/src/sys/contrib/altq -I/usr/src/sys/contrib/ipfilter 
-I/usr/src/sys/contrib/pf -I/usr/src/sys/contrib/dev/ath 
-I/usr/src/sys/contrib/dev/ath/freebsd -I/usr/src/sys/contrib/ngatm 
-I/usr/src/sys/dev/twa -D_KERNEL -DHAVE_KERNEL_OPTION_HEADERS -include 
opt_global.h -fno-common -finline-limit=8000 --param 
inline-unit-growth=100 --param large-function-growth=1000  
-mcmodel=kernel -mno-red-zone  -mfpmath=387 -mno-sse -mno-sse2 -mno-mmx 
-mno-3dnow  -msoft-float -fno-asynchronous-unwind-tables -ffreestanding 
-Werror  /usr/src/sys/dev/usb/usb.c

/usr/src/sys/dev/usb/usb.c: In function `usb_attach':
/usr/src/sys/dev/usb/usb.c:282: error: `usb_intr_task' undeclared (first 
use in this function)
/usr/src/sys/dev/usb/usb.c:282: error: (Each undeclared identifier is 
reported only once

/usr/src/sys/dev/usb/usb.c:282: error: for each function it appears in.)
/usr/src/sys/dev/usb/usb.c: At top level:
/usr/src/sys/dev/usb/usb.c:863: warning: 'usb_intr_task' defined but not 
used

*** Error code 1

Stop in /usr/obj/usr/src/sys/THOR.
*** Error code 1

Stop in /usr/src.
*** Error code 1

Stop in /usr/src.


Uname:

FreeBSD my.box.org 6.2-PRERELEASE FreeBSD 6.2-PRERELEASE #85: Thu Sep 28 
17:09:24 UTC 2006 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/THOR  amd64

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]

2006-09-28 Thread Scott Long

O. Hartmann wrote:

Scott Long wrote:


All,

Attached is my first cut at addressing the problems described in this 
thread.  As I discussed earlier, the VM syncer thread is likely starving
the USB interrupt thread.  This causes the shared usb+network 
interrupt to remain masked, preventing network interrupts from being 
delivered,

and thus triggering watchdog timeouts.

This patch only addresses the USB driver.  If your network card is 
sharing an interrupt with something other than (or additional to) USB,

this might not work for you.  Also, this patch is just a very rough
proof-of-concept and is not meant for production use.  But I'd like to
get feedback now before I spend more time on this.  If this works then
I'll clean it up and make it suitable for the release, and I'll look at
some of the other drivers like ichsmb.

If this is to be fixed for 6.2, I need lots of feedback ASAP.  So please
do not be shy =-)  The patch is at:

http://people.freebsd.org/~scottl/usb_fastintr.diff

Scott

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


patch does not work on my box:

cc -c -O2 -frename-registers -pipe -fno-strict-aliasing -march=athlon64 
-Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes  
-Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual  
-fformat-extensions -std=c99  -nostdinc -I-  -I. -I/usr/src/sys 
-I/usr/src/sys/contrib/altq -I/usr/src/sys/contrib/ipfilter 
-I/usr/src/sys/contrib/pf -I/usr/src/sys/contrib/dev/ath 
-I/usr/src/sys/contrib/dev/ath/freebsd -I/usr/src/sys/contrib/ngatm 
-I/usr/src/sys/dev/twa -D_KERNEL -DHAVE_KERNEL_OPTION_HEADERS -include 
opt_global.h -fno-common -finline-limit=8000 --param 
inline-unit-growth=100 --param large-function-growth=1000  
-mcmodel=kernel -mno-red-zone  -mfpmath=387 -mno-sse -mno-sse2 -mno-mmx 
-mno-3dnow  -msoft-float -fno-asynchronous-unwind-tables -ffreestanding 
-Werror  /usr/src/sys/dev/usb/usb.c

/usr/src/sys/dev/usb/usb.c: In function `usb_attach':
/usr/src/sys/dev/usb/usb.c:282: error: `usb_intr_task' undeclared (first 
use in this function)
/usr/src/sys/dev/usb/usb.c:282: error: (Each undeclared identifier is 
reported only once

/usr/src/sys/dev/usb/usb.c:282: error: for each function it appears in.)
/usr/src/sys/dev/usb/usb.c: At top level:
/usr/src/sys/dev/usb/usb.c:863: warning: 'usb_intr_task' defined but not 
used

*** Error code 1

Stop in /usr/obj/usr/src/sys/THOR.
*** Error code 1

Stop in /usr/src.
*** Error code 1

Stop in /usr/src.



I accidentally posted a patch against HEAD, not RELENG_6.  I'll correct 
that shortly.


Scott
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]

2006-09-28 Thread Mike Tancsa

At 03:15 PM 9/28/2006, O. Hartmann wrote:


/usr/src/sys/dev/usb/usb.c:282: error: for each function it appears in.)
/usr/src/sys/dev/usb/usb.c: At top level:
/usr/src/sys/dev/usb/usb.c:863: warning: 'usb_intr_task' defined but not used
*** Error code 1



Are you sure the patch applied cleanly to STABLE ?  There are a 
couple of spots you need to change manually as it assumes the version 
of USB from HEAD.


Manually apply the patch for usb.c and ohci_pci.c if you are using 
STABLE and remove the offending bits from the patch and it should 
compile cleanly.


---Mike 


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]

2006-09-28 Thread Scott Long

Mike Tancsa wrote:

At 03:15 PM 9/28/2006, O. Hartmann wrote:


/usr/src/sys/dev/usb/usb.c:282: error: for each function it appears in.)
/usr/src/sys/dev/usb/usb.c: At top level:
/usr/src/sys/dev/usb/usb.c:863: warning: 'usb_intr_task' defined but 
not used

*** Error code 1




Are you sure the patch applied cleanly to STABLE ?  There are a couple 
of spots you need to change manually as it assumes the version of USB 
from HEAD.


Manually apply the patch for usb.c and ohci_pci.c if you are using 
STABLE and remove the offending bits from the patch and it should 
compile cleanly.


---Mike


Corrected patch is at:

http://people.freebsd.org/~scottl/usb_fastintr_RELENG_6.diff

Sorry for the confusion.

Scott

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]

2006-09-28 Thread Mike Jakubik

Scott Long wrote:

All,

Attached is my first cut at addressing the problems described in this 
thread.  As I discussed earlier, the VM syncer thread is likely starving
the USB interrupt thread.  This causes the shared usb+network 
interrupt to remain masked, preventing network interrupts from being 
delivered,

and thus triggering watchdog timeouts.


Just to be clear, has it been established that the problem only occurs 
when em is sharing an interrupt? I have a lot of production machines 
using the PDSMi board, which is one of the boards that the problem was 
noticed on, however i do not share any irqs, i always disable USB in the 
BIOS.


# vmstat -i
interrupt  total   rate
irq16: em0  13001181  7
irq19: atapci0  76559511 42
cpu0: timer   3643365617   1999
cpu1: timer   3643365610   1999
Total 7376291919   4048


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]

2006-09-28 Thread Philippe Pegon

Mike Jakubik wrote:

Scott Long wrote:

All,

Attached is my first cut at addressing the problems described in this 
thread.  As I discussed earlier, the VM syncer thread is likely starving
the USB interrupt thread.  This causes the shared usb+network 
interrupt to remain masked, preventing network interrupts from being 
delivered,

and thus triggering watchdog timeouts.


Just to be clear, has it been established that the problem only occurs 
when em is sharing an interrupt? I have a lot of production machines 
using the PDSMi board, which is one of the boards that the problem was 
noticed on, however i do not share any irqs, i always disable USB in the 
BIOS.


On many of our servers, we have bge cards and I can see a lot of 
watchdog timeouts. We always disable USB in the bios and they didn't 
share irq.




# vmstat -i
interrupt  total   rate
irq16: em0  13001181  7
irq19: atapci0  76559511 42
cpu0: timer   3643365617   1999
cpu1: timer   3643365610   1999
Total 7376291919   4048


example with our ftp server (ftp8.fr.freebsd.org), a HP DL360 G4 SMP :

# vmstat -i
interrupt  total   rate
irq1: atkbd01576  0
irq4: sio0 3  0
irq6: fdc012  0
irq14: ata0   57  0
irq24: ciss117181184  8
irq25: bge0841821262402
irq26: bge1674342644322
irq72: ciss024194679 11
cpu0: timer   4180478365   1999
cpu1: timer   4180886439   1999
Total 9918906221   4743

# bzgrep watchdog /var/log/messages*
/var/log/messages:Sep 23 02:47:06 anubis kernel: bge1: watchdog timeout 
-- resetting
/var/log/messages.0.bz2:Sep 12 22:22:48 anubis kernel: bge1: watchdog 
timeout -- resetting
/var/log/messages.0.bz2:Sep 17 15:22:01 anubis kernel: bge1: watchdog 
timeout -- resetting
/var/log/messages.0.bz2:Sep 20 12:13:07 anubis kernel: bge1: watchdog 
timeout -- resetting
/var/log/messages.1.bz2:Sep  6 08:33:54 anubis kernel: bge1: watchdog 
timeout -- resetting
/var/log/messages.3.bz2:Aug 29 12:09:36 anubis kernel: bge0: watchdog 
timeout -- resetting
/var/log/messages.4.bz2:Aug 22 15:44:00 anubis kernel: bge0: watchdog 
timeout -- resetting


# pciconf -lv
[EMAIL PROTECTED]:0:0:class=0x06 card=0x32000e11 chip=0x35908086 
rev=0x0a hdr=0x00

vendor   = 'Intel Corporation'
device   = 'E752x Server Memory Controller Hub'
class= bridge
subclass = HOST-PCI
[EMAIL PROTECTED]:2:0: class=0x060400 card=0x0050 chip=0x35958086 rev=0x0a 
hdr=0x01

vendor   = 'Intel Corporation'
device   = 'E752x Memory Controller Hub PCI Express Port A0'
class= bridge
subclass = PCI-PCI
[EMAIL PROTECTED]:4:0: class=0x060400 card=0x0050 chip=0x35978086 rev=0x0a 
hdr=0x01

vendor   = 'Intel Corporation'
device   = 'E752x Memory Controller Hub PCI Express Port B0'
class= bridge
subclass = PCI-PCI
[EMAIL PROTECTED]:6:0: class=0x060400 card=0x0050 chip=0x35998086 rev=0x0a 
hdr=0x01

vendor   = 'Intel Corporation'
device   = 'E752x Memory Controller Hub PCI Express Port C0'
class= bridge
subclass = PCI-PCI
[EMAIL PROTECTED]:28:0:class=0x060400 card=0x0050 chip=0x25ae8086 
rev=0x02 hdr=0x01

vendor   = 'Intel Corporation'
device   = '6300ESB Hub Interface to PCI-X Bridge'
class= bridge
subclass = PCI-PCI
[EMAIL PROTECTED]:30:0:class=0x060400 card=0x chip=0x244e8086 
rev=0x0a hdr=0x01

vendor   = 'Intel Corporation'
device   = '82801BA/CA/DB/DBL/EB/ER/FB (ICH2/3/4/4/5/5/6), 6300ESB 
Hub Interface to PCI Bridge'

class= bridge
subclass = PCI-PCI
[EMAIL PROTECTED]:31:0:class=0x060100 card=0x chip=0x25a18086 
rev=0x02 hdr=0x00

vendor   = 'Intel Corporation'
device   = '6300ESB LPC Interface Bridge'
class= bridge
subclass = PCI-ISA
[EMAIL PROTECTED]:31:1:  class=0x01018a card=0x32010e11 chip=0x25a28086 
rev=0x02 hdr=0x00

vendor   = 'Intel Corporation'
device   = '6300ESB IDE Controller'
class= mass storage
subclass = ATA
[EMAIL PROTECTED]:0:0: class=0x060400 card=0x0044 chip=0x03298086 rev=0x09 
hdr=0x01

vendor   = 'Intel Corporation'
device   = '6700PXH PCI Express-to-PCI Express Bridge A'
class= bridge
subclass = PCI-PCI
[EMAIL PROTECTED]:0:2: class=0x060400 card=0x0044 chip=0x032a8086 rev=0x09 
hdr=0x01

vendor   = 'Intel Corporation'
device   = '6700PXH PCI Express-to-PCI Express Bridge B'
class= bridge
subclass = PCI-PCI
[EMAIL PROTECTED]:1:0:class=0x010400 card=0x409b0e11 chip=0x00460e11 
rev=0x01 hdr=0x00


Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]

2006-09-28 Thread Pete French
 On many of our servers, we have bge cards and I can see a lot of 
 watchdog timeouts. We always disable USB in the bios and they didn't 
 share irq.

I see the same thing - we have a number of HP blades which use bge interfaces
and I get many watchdog timeouts on them. These are also not sharing any
interrupts

interrupt  total   rate
irq1: atkbd0   2  0
irq24: ciss0   13208 11
irq74: bge1   1452046216120
cpu0: timer   2581779930214
cpu2: timer   2579262777214
cpu1: timer   2581771929214
cpu3: timer   2579262777214
Total11909678839989

This is 6.1 - I have a couple of boxes running 6.2 and those have not
shown any timeouts so far. They are, however, far more lightly loaded.

-pete.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]