Panic on reboot with gmirror+gjournal

2007-04-04 Thread Guy Brand
  Hi all,


  I have a setup on 6-STABLE from today with two identical disks in
  a gm0 provider and 3 gjournal providers on it:

Geom name: gm0
State: COMPLETE
Components: 2
Balance: round-robin
Slice: 1
Flags: NONE
GenID: 0
SyncID: 1
ID: 2763081532
Providers:
1. Name: mirror/gm0
   Mediasize: 500107861504 (466G)
   Sectorsize: 512
   Mode: r3w3e4
Consumers:
1. Name: ad5
   Mediasize: 500107862016 (466G)
   Sectorsize: 512
   Mode: r1w1e1
   State: ACTIVE
   Priority: 0
   Flags: NONE
   GenID: 0
   SyncID: 1
   ID: 2315942152
2. Name: ad6
   Mediasize: 500107862016 (466G)
   Sectorsize: 512
   Mode: r1w1e1
   State: ACTIVE
   Priority: 1
   Flags: NONE
   GenID: 0
   SyncID: 1
   ID: 3121293515


Geom name: gjournal 1524581050
ID: 1524581050
Providers:
1. Name: mirror/gm0f.journal
   Mediasize: 106300440064 (99G)
   Sectorsize: 512
   Mode: r1w1e1
Consumers:
1. Name: mirror/gm0f
   Mediasize: 107374182400 (100G)
   Sectorsize: 512
   Mode: r1w1e1
   Jend: 107374181888
   Jstart: 106300440064
   Role: Data,Journal

Geom name: gjournal 1230399956
ID: 1230399956
Providers:
1. Name: mirror/gm0g.journal
   Mediasize: 192199785984 (179G)
   Sectorsize: 512
   Mode: r1w1e1
Consumers:
1. Name: mirror/gm0g
   Mediasize: 193273528320 (180G)
   Sectorsize: 512
   Mode: r1w1e1
   Jend: 193273527808
   Jstart: 192199785984
   Role: Data,Journal

Geom name: gjournal 1616155040
ID: 1616155040
Providers:
1. Name: mirror/gm0h.journal
   Mediasize: 193061356032 (180G)
   Sectorsize: 512
   Mode: r1w1e1
Consumers:
1. Name: mirror/gm0h
   Mediasize: 194135098368 (181G)
   Sectorsize: 512
   Mode: r1w1e1
   Jend: 194135097856
   Jstart: 193061356032
   Role: Data,Journal

  Everything works fine until I reboot. At the end of the shutdown I
  see the 3 journals being shut down, then gm0 being destroyed but
  then gjournal tries to do something more, which is probably wrong
  as all the consumers have disappeared:

  All buffers synced
  Uptime: 16m27s
  GEOM_JOURNAL: Shutting down geom gjournal 1616155040.
  GEOM_JOURNAL: Shutting down geom gjournal 1230399956.
  GEOM_JOURNAL: Shutting down geom gjournal 1524581050.
  GEOM_MIRROR: Device gm0: provider mirror/gm0 destroyed.
  GEOM_MIRROR: Device gm0 destroyed.
  GEOM_JOURNAL:

  Fatal trap 12: page fault while in kernel mode
  ...


  On manual reset, FS come up clean of course, so there is not much
  harm, but it would be better if I could fix that.

-- 
  bug

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Patch available for shared em interrupts (Re: em, bge, network problems survey.)

2006-10-06 Thread Guy Brand
Kris Kennaway ([EMAIL PROTECTED]) on 05/10/2006 at 22:34 wrote:

 Based on successful testing on a machine with shared em interrupt, the
 following patch should work around the problem *in that case*.
[...]
 Please let Scott and I know whether or not this patch works for you
 (in addition to the information previously requested, if you have not
 already sent it).  Unfortunately it is only a workaround, but it
 points to an underlying problem with fast interrupt handlers on a
 shared irq that can be studied separately.

  # mojito uptime
  14:23  up  1:59, 4 users, load averages: 0,07 0,05 0,01
  # mojito uname -v
  FreeBSD 6.2-PRERELEASE #15: Fri Oct  6 12:11:36 CEST 2006
  [EMAIL PROTECTED]:/usr/obj/usr/src/sys/DEBUG 

  Your patch fixes my em/nvidia issue.
  Thanks Kris

-- 
  bug

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]

2006-10-05 Thread Guy Brand
Scott Long ([EMAIL PROTECTED]) on 04/10/2006 at 14:49 wrote:

 #*default release=cvs tag=RELENG_6 date=2006.08.08.09.12.56
 # OK
 #
 #*default release=cvs tag=RELENG_6 date=2006.08.08.09.21.00
 # BROKEN
 ...
 
 #*default release=cvs tag=RELENG_6
 # BROKEN
 
   From sys commitlogs the culprit commits are:
 
   glebius 2006-08-08 09:19:25 utc
   glebius 2006-08-08 09:20:26 utc

 So you tested before these two changes and after these two changes, yes?

  Yes that's it.

 What about with just the first change and not the second?  Anyways, I'm 

  Because building a kernel that only has the first change (2006-08-08
  09:19:25) fails.

 Can you try a quick test?  Reboot and press '6' at the FreeBSD loader
 menu.  That will drop you to a prompt.  Then enter the following line:
 
 set hint.apic.0.disabled=1

  Done: synced to STABLE-6 of this morning (9:00 UTC)i, made world and
  kernel and boot with APIC disabled. Still same freeze after starting
  X and loading a few tabs in Firefox.

  Thanks for the suggestion Scott.

-- 
  bug

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]

2006-10-04 Thread Guy Brand
Craig Boston ([EMAIL PROTECTED]) on 29/09/2006 at 20:19 wrote:

 One thing this patch definitely did do though, is break the nvidia
 driver pretty badly.  Couldn't keep the X server running for more than a
 minute before it froze solid.  Lots of Xid: blah blah blah messages.
 Yes I remembered to rebuild the kernel module ;)

  Hi,


  Since rebuilding to 6.2-PRERELEASE FreeBSD 6.2-PRERELEASE #1: Mon
  Oct  2 15:24:04 CEST 2006 DEBUG  i386 on a box having em sharing
  IRQ with nvidia (NVIDIA-FreeBSD-x86-1.0-8756):

  interrupt  total   rate
  irq1: atkbd0   5  0
  irq14: ata0   47  0
  irq16: nvidia0 em+ 86545185
  irq17: fwohci0 7  0
  irq21: twe0 6426 13
  cpu0: timer   927735   1986
  Total1020765   2185

  I freeze the box by starting firefox which reloads a few tabs I keep
  open in my session when under X. This is perfectly reproductible.
  From the logs, first I see:

Oct  2 16:47:39 mojito kernel: NVRM: Xid (0001:00): 16, Head  Count 
00010597
Oct  2 16:47:43 mojito kernel: NVRM: Xid (0001:00): 8, Channel 
Oct  2 16:47:47 mojito kernel: NVRM: Xid (0001:00): 16, Head  Count 
00010598
Oct  2 16:47:55 mojito kernel: NVRM: Xid (0001:00): 16, Head  Count 
00010599
Oct  2 16:48:03 mojito kernel: NVRM: Xid (0001:00): 16, Head  Count 
0001059a
Oct  2 16:48:11 mojito kernel: NVRM: Xid (0001:00): 16, Head  Count 
0001059b
Oct  2 16:48:19 mojito kernel: NVRM: Xid (0001:00): 16, Head  Count 
0001059c
Oct  2 16:48:27 mojito kernel: NVRM: Xid (0001:00): 16, Head  Count 
0001059d
Oct  2 16:48:35 mojito kernel: NVRM: Xid (0001:00): 16, Head  Count 
0001059e
Oct  2 16:48:43 mojito kernel: NVRM: Xid (0001:00): 16, Head  Count 
0001059f
Oct  2 16:48:52 mojito kernel: NVRM: Xid (0001:00): 16, Head  Count 
000105a0

  then come the watchdogs:

Oct  2 16:48:56 mojito kernel: em0: watchdog timeout -- resetting
Oct  2 16:48:56 mojito kernel: em0: link state changed to DOWN
Oct  2 16:48:58 mojito kernel: em0: link state changed to UP
Oct  2 16:49:00 mojito kernel: NVRM: Xid (0001:00): 16, Head  Count 
000105a1
Oct  2 16:49:06 mojito kernel: em0: watchdog timeout -- resetting
Oct  2 16:49:06 mojito kernel: em0: link state changed to DOWN
Oct  2 16:49:08 mojito kernel: NVRM: Xid (0001:00): 16, Head  Count 
000105a2
Oct  2 16:49:08 mojito kernel: em0: link state changed to UP
Oct  2 16:49:16 mojito kernel: NVRM: Xid (0001:00): 16, Head  Count 
000105a3
Oct  2 16:49:16 mojito kernel: em0: watchdog timeout -- resetting
Oct  2 16:49:16 mojito kernel: em0: link state changed to DOWN
Oct  2 16:49:18 mojito kernel: em0: link state changed to UP
Oct  2 16:49:24 mojito kernel: NVRM: Xid (0001:00): 16, Head  Count 
000105a4
Oct  2 16:49:26 mojito kernel: em0: watchdog timeout -- resetting
Oct  2 16:49:26 mojito kernel: em0: link state changed to DOWN
Oct  2 16:49:29 mojito kernel: em0: link state changed to UP
Oct  2 16:49:32 mojito kernel: NVRM: Xid (0001:00): 16, Head  Count 
000105a5
Oct  2 16:49:36 mojito kernel: em0: watchdog timeout -- resetting
Oct  2 16:49:36 mojito kernel: em0: link state changed to DOWN
Oct  2 16:49:39 mojito kernel: em0: link state changed to UP
Oct  2 16:49:47 mojito kernel: em0: watchdog timeout -- resetting
Oct  2 16:49:47 mojito kernel: em0: link state changed to DOWN
Oct  2 16:49:49 mojito kernel: em0: link state changed to UP

  and the box ends up frozen less than a minute later. The traffic
  on the Intel card can be low (pinging a host for a few dozen of
  seconds), medium (reloading a few pages in the tabs of Firefox) or
  high (downloading several iso images from our local FTP mirror):
  whatever I do, if both nvidia and em0 are used, the box freezes.

  Note that I can't freeze the box when doing several simultaneous big
  downloads or taring up a lot of files but NOT running X. So I guess
  it is a shared nvidia/em IRQ issue.

  FreeBSD 6.1-STABLE #0: Fri Jun 23 17:00:43 CEST 2006 had no such problem.
  The DEBUG kernconf is GENERIC + witness options enabled (but they
  do not help in this case).

  I traced back to find which changeset introduced the trouble. The
  results are:

#*default release=cvs tag=RELENG_6 date=2006.06.23.17.00.00
# OK
...

#*default release=cvs tag=RELENG_6 date=2006.08.08.09.12.56
# OK
#
#*default release=cvs tag=RELENG_6 date=2006.08.08.09.21.00
# BROKEN
...

#*default release=cvs tag=RELENG_6
# BROKEN

  From sys commitlogs the culprit commits are:

  glebius 2006-08-08 09:19:25 utc
  freebsd src repository

  modified files: