Guy Brand wrote:
Craig Boston ([EMAIL PROTECTED]) on 29/09/2006 at 20:19 wrote:


One thing this patch definitely did do though, is break the nvidia
driver pretty badly.  Couldn't keep the X server running for more than a
minute before it froze solid.  Lots of Xid: blah blah blah messages.
Yes I remembered to rebuild the kernel module ;)


  Hi,


  Since rebuilding to 6.2-PRERELEASE FreeBSD 6.2-PRERELEASE #1: Mon
  Oct  2 15:24:04 CEST 2006 DEBUG  i386 on a box having em sharing
  IRQ with nvidia (NVIDIA-FreeBSD-x86-1.0-8756):

  interrupt                          total       rate
  irq1: atkbd0                           5          0
  irq14: ata0                           47          0
  irq16: nvidia0 em+                 86545        185
  irq17: fwohci0                         7          0
  irq21: twe0                         6426         13
  cpu0: timer                       927735       1986
  Total                            1020765       2185

  I freeze the box by starting firefox which reloads a few tabs I keep
  open in my session when under X. This is perfectly reproductible.
  From the logs, first I see:

    Oct  2 16:47:39 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 
00010597
    Oct  2 16:47:43 mojito kernel: NVRM: Xid (0001:00): 8, Channel 00000000
    Oct  2 16:47:47 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 
00010598
    Oct  2 16:47:55 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 
00010599
    Oct  2 16:48:03 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 
0001059a
    Oct  2 16:48:11 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 
0001059b
    Oct  2 16:48:19 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 
0001059c
    Oct  2 16:48:27 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 
0001059d
    Oct  2 16:48:35 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 
0001059e
    Oct  2 16:48:43 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 
0001059f
    Oct  2 16:48:52 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 
000105a0

  then come the watchdogs:

    Oct  2 16:48:56 mojito kernel: em0: watchdog timeout -- resetting
    Oct  2 16:48:56 mojito kernel: em0: link state changed to DOWN
    Oct  2 16:48:58 mojito kernel: em0: link state changed to UP
    Oct  2 16:49:00 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 
000105a1
    Oct  2 16:49:06 mojito kernel: em0: watchdog timeout -- resetting
    Oct  2 16:49:06 mojito kernel: em0: link state changed to DOWN
    Oct  2 16:49:08 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 
000105a2
    Oct  2 16:49:08 mojito kernel: em0: link state changed to UP
    Oct  2 16:49:16 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 
000105a3
    Oct  2 16:49:16 mojito kernel: em0: watchdog timeout -- resetting
    Oct  2 16:49:16 mojito kernel: em0: link state changed to DOWN
    Oct  2 16:49:18 mojito kernel: em0: link state changed to UP
    Oct  2 16:49:24 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 
000105a4
    Oct  2 16:49:26 mojito kernel: em0: watchdog timeout -- resetting
    Oct  2 16:49:26 mojito kernel: em0: link state changed to DOWN
    Oct  2 16:49:29 mojito kernel: em0: link state changed to UP
    Oct  2 16:49:32 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 
000105a5
    Oct  2 16:49:36 mojito kernel: em0: watchdog timeout -- resetting
    Oct  2 16:49:36 mojito kernel: em0: link state changed to DOWN
    Oct  2 16:49:39 mojito kernel: em0: link state changed to UP
    Oct  2 16:49:47 mojito kernel: em0: watchdog timeout -- resetting
    Oct  2 16:49:47 mojito kernel: em0: link state changed to DOWN
    Oct  2 16:49:49 mojito kernel: em0: link state changed to UP

  and the box ends up frozen less than a minute later. The traffic
  on the Intel card can be low (pinging a host for a few dozen of
  seconds), medium (reloading a few pages in the tabs of Firefox) or
  high (downloading several iso images from our local FTP mirror):
  whatever I do, if both nvidia and em0 are used, the box freezes.

  Note that I can't freeze the box when doing several simultaneous big
  downloads or taring up a lot of files but NOT running X. So I guess
  it is a shared nvidia/em IRQ issue.

  FreeBSD 6.1-STABLE #0: Fri Jun 23 17:00:43 CEST 2006 had no such problem.
  The "DEBUG" kernconf is GENERIC + witness options enabled (but they
  do not help in this case).

  I traced back to find which changeset introduced the trouble. The
  results are:

    #*default release=cvs tag=RELENG_6 date=2006.06.23.17.00.00
    # OK
    ...

    #*default release=cvs tag=RELENG_6 date=2006.08.08.09.12.56
    # OK
    #
    #*default release=cvs tag=RELENG_6 date=2006.08.08.09.21.00
    # BROKEN
    ...

    #*default release=cvs tag=RELENG_6
    # BROKEN

  From sys commitlogs the culprit commits are:

  glebius     2006-08-08 09:19:25 utc
  freebsd src repository

  modified files:        (branch: releng_6)
sys/dev/em if_em.c log:
  sync with head. this includes the following changes in chronological
  order:
o a significant performance improvements. the interrupt handler
    schedules work to a private taskqueue. the em_rxeof() function
    runs lockless.
    rev. 1.98 - 1.101 by scottl.
    rev. 1.103 by mux
    rev. 1.106 by glebius, from andrey v. elsukov <bu7cher yandex.ru>
    rev. 1.116 by glebius
  o style cleanups:
    - rev. 1.102, 1.108, 1.109 by glebius
    - rev. 1.124 by pdeuskar
  o vendor merges:
    - merged with vendor driver version 5.1.5 by jack vogel.
      rev. 1.115 by glebius
    - merged with vendor driver version 6.0.5 by jack vogel.
      rev. 1.123 by glebius
  o various fixes:
    - invalid use of bus_dma_allocnow
      rev. 1.104 by scott, 1.121 by yongari
    - link state handling cleanup.
      rev. 1.110 by glebius
    - fix if_baudrate handling.
      rev. 1.111 by glebius
    - honor iff_drv_oactive in em_start_locked().
      rev. 1.117 by yongari
    - protect eeprom access with the driver lock.
      rev. 1.118 by yongari
    - fix link flap on siocgifaddr.
      rev. 1.119 by yongari
    - fix dma map handling in em_encap().
      rev. 1.120,1.122 by yongari
revision changes path
  1.65.2.17  +1587 -1443  src/sys/dev/em/if_em.c


  glebius     2006-08-08 09:20:26 utc
  freebsd src repository

  modified files:        (branch: releng_6)
sys/dev/em license readme if_em.h if_em_hw.c if_em_hw.h if_em_osdep.h log:
  sync with head, merging vendor drivers updates 5.1.5, 6.0.5 by jack vogel.
revision changes path
  1.3.2.1   +1 -1       src/sys/dev/em/license
  1.10.2.1  +71 -30     src/sys/dev/em/readme
  1.32.2.3  +133 -157   src/sys/dev/em/if_em.h
  1.16.2.2  +3186 -906  src/sys/dev/em/if_em_hw.c
  1.15.2.3  +712 -48    src/sys/dev/em/if_em_hw.h
  1.14.2.2  +46 -15     src/sys/dev/em/if_em_osdep.h


  I confirmed that by building a kernel from 2006.08.08.09.21.00 which
  shows the problem and a kernel from 2006.08.08.09.18.00 which works
  like a charm.

  Dunno if this could be linked to the em* watchdogs reported in this
  thread. Let me know if I can do something useful to help fixing this
  issue.


So you tested before these two changes and after these two changes, yes?
What about with just the first change and not the second? Anyways, I'm starting to see a trend here. Problem reports are clustering around UP
systems, not SMP systems.  I don't know if that's just coincidence or not.

Can you try a quick test?  Reboot and press '6' at the FreeBSD loader
menu.  That will drop you to a prompt.  Then enter the following line:

set hint.apic.0.disabled=1

Then continue the boot by entering:

boot

The machine should boot up normally. If it doesn't boot, just reset the machine and allow it to boot without the apic change. With the change,
as well as the up to date em driver, see if you still get the nvidia and
other problems.

Scott

_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to