Panic on reboot with gmirror+gjournal
Hi all, I have a setup on 6-STABLE from today with two identical disks in a gm0 provider and 3 gjournal providers on it: Geom name: gm0 State: COMPLETE Components: 2 Balance: round-robin Slice: 1 Flags: NONE GenID: 0 SyncID: 1 ID: 2763081532 Providers: 1. Name: mirror/gm0 Mediasize: 500107861504 (466G) Sectorsize: 512 Mode: r3w3e4 Consumers: 1. Name: ad5 Mediasize: 500107862016 (466G) Sectorsize: 512 Mode: r1w1e1 State: ACTIVE Priority: 0 Flags: NONE GenID: 0 SyncID: 1 ID: 2315942152 2. Name: ad6 Mediasize: 500107862016 (466G) Sectorsize: 512 Mode: r1w1e1 State: ACTIVE Priority: 1 Flags: NONE GenID: 0 SyncID: 1 ID: 3121293515 Geom name: gjournal 1524581050 ID: 1524581050 Providers: 1. Name: mirror/gm0f.journal Mediasize: 106300440064 (99G) Sectorsize: 512 Mode: r1w1e1 Consumers: 1. Name: mirror/gm0f Mediasize: 107374182400 (100G) Sectorsize: 512 Mode: r1w1e1 Jend: 107374181888 Jstart: 106300440064 Role: Data,Journal Geom name: gjournal 1230399956 ID: 1230399956 Providers: 1. Name: mirror/gm0g.journal Mediasize: 192199785984 (179G) Sectorsize: 512 Mode: r1w1e1 Consumers: 1. Name: mirror/gm0g Mediasize: 193273528320 (180G) Sectorsize: 512 Mode: r1w1e1 Jend: 193273527808 Jstart: 192199785984 Role: Data,Journal Geom name: gjournal 1616155040 ID: 1616155040 Providers: 1. Name: mirror/gm0h.journal Mediasize: 193061356032 (180G) Sectorsize: 512 Mode: r1w1e1 Consumers: 1. Name: mirror/gm0h Mediasize: 194135098368 (181G) Sectorsize: 512 Mode: r1w1e1 Jend: 194135097856 Jstart: 193061356032 Role: Data,Journal Everything works fine until I reboot. At the end of the shutdown I see the 3 journals being shut down, then gm0 being destroyed but then gjournal tries to do something more, which is probably wrong as all the consumers have disappeared: All buffers synced Uptime: 16m27s GEOM_JOURNAL: Shutting down geom gjournal 1616155040. GEOM_JOURNAL: Shutting down geom gjournal 1230399956. GEOM_JOURNAL: Shutting down geom gjournal 1524581050. GEOM_MIRROR: Device gm0: provider mirror/gm0 destroyed. GEOM_MIRROR: Device gm0 destroyed. GEOM_JOURNAL: Fatal trap 12: page fault while in kernel mode ... On manual reset, FS come up clean of course, so there is not much harm, but it would be better if I could fix that. -- bug ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Patch available for shared em interrupts (Re: em, bge, network problems survey.)
Kris Kennaway ([EMAIL PROTECTED]) on 05/10/2006 at 22:34 wrote: Based on successful testing on a machine with shared em interrupt, the following patch should work around the problem *in that case*. [...] Please let Scott and I know whether or not this patch works for you (in addition to the information previously requested, if you have not already sent it). Unfortunately it is only a workaround, but it points to an underlying problem with fast interrupt handlers on a shared irq that can be studied separately. # mojito uptime 14:23 up 1:59, 4 users, load averages: 0,07 0,05 0,01 # mojito uname -v FreeBSD 6.2-PRERELEASE #15: Fri Oct 6 12:11:36 CEST 2006 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/DEBUG Your patch fixes my em/nvidia issue. Thanks Kris -- bug ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]
Scott Long ([EMAIL PROTECTED]) on 04/10/2006 at 14:49 wrote: #*default release=cvs tag=RELENG_6 date=2006.08.08.09.12.56 # OK # #*default release=cvs tag=RELENG_6 date=2006.08.08.09.21.00 # BROKEN ... #*default release=cvs tag=RELENG_6 # BROKEN From sys commitlogs the culprit commits are: glebius 2006-08-08 09:19:25 utc glebius 2006-08-08 09:20:26 utc So you tested before these two changes and after these two changes, yes? Yes that's it. What about with just the first change and not the second? Anyways, I'm Because building a kernel that only has the first change (2006-08-08 09:19:25) fails. Can you try a quick test? Reboot and press '6' at the FreeBSD loader menu. That will drop you to a prompt. Then enter the following line: set hint.apic.0.disabled=1 Done: synced to STABLE-6 of this morning (9:00 UTC)i, made world and kernel and boot with APIC disabled. Still same freeze after starting X and loading a few tabs in Firefox. Thanks for the suggestion Scott. -- bug ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]
Craig Boston ([EMAIL PROTECTED]) on 29/09/2006 at 20:19 wrote: One thing this patch definitely did do though, is break the nvidia driver pretty badly. Couldn't keep the X server running for more than a minute before it froze solid. Lots of Xid: blah blah blah messages. Yes I remembered to rebuild the kernel module ;) Hi, Since rebuilding to 6.2-PRERELEASE FreeBSD 6.2-PRERELEASE #1: Mon Oct 2 15:24:04 CEST 2006 DEBUG i386 on a box having em sharing IRQ with nvidia (NVIDIA-FreeBSD-x86-1.0-8756): interrupt total rate irq1: atkbd0 5 0 irq14: ata0 47 0 irq16: nvidia0 em+ 86545185 irq17: fwohci0 7 0 irq21: twe0 6426 13 cpu0: timer 927735 1986 Total1020765 2185 I freeze the box by starting firefox which reloads a few tabs I keep open in my session when under X. This is perfectly reproductible. From the logs, first I see: Oct 2 16:47:39 mojito kernel: NVRM: Xid (0001:00): 16, Head Count 00010597 Oct 2 16:47:43 mojito kernel: NVRM: Xid (0001:00): 8, Channel Oct 2 16:47:47 mojito kernel: NVRM: Xid (0001:00): 16, Head Count 00010598 Oct 2 16:47:55 mojito kernel: NVRM: Xid (0001:00): 16, Head Count 00010599 Oct 2 16:48:03 mojito kernel: NVRM: Xid (0001:00): 16, Head Count 0001059a Oct 2 16:48:11 mojito kernel: NVRM: Xid (0001:00): 16, Head Count 0001059b Oct 2 16:48:19 mojito kernel: NVRM: Xid (0001:00): 16, Head Count 0001059c Oct 2 16:48:27 mojito kernel: NVRM: Xid (0001:00): 16, Head Count 0001059d Oct 2 16:48:35 mojito kernel: NVRM: Xid (0001:00): 16, Head Count 0001059e Oct 2 16:48:43 mojito kernel: NVRM: Xid (0001:00): 16, Head Count 0001059f Oct 2 16:48:52 mojito kernel: NVRM: Xid (0001:00): 16, Head Count 000105a0 then come the watchdogs: Oct 2 16:48:56 mojito kernel: em0: watchdog timeout -- resetting Oct 2 16:48:56 mojito kernel: em0: link state changed to DOWN Oct 2 16:48:58 mojito kernel: em0: link state changed to UP Oct 2 16:49:00 mojito kernel: NVRM: Xid (0001:00): 16, Head Count 000105a1 Oct 2 16:49:06 mojito kernel: em0: watchdog timeout -- resetting Oct 2 16:49:06 mojito kernel: em0: link state changed to DOWN Oct 2 16:49:08 mojito kernel: NVRM: Xid (0001:00): 16, Head Count 000105a2 Oct 2 16:49:08 mojito kernel: em0: link state changed to UP Oct 2 16:49:16 mojito kernel: NVRM: Xid (0001:00): 16, Head Count 000105a3 Oct 2 16:49:16 mojito kernel: em0: watchdog timeout -- resetting Oct 2 16:49:16 mojito kernel: em0: link state changed to DOWN Oct 2 16:49:18 mojito kernel: em0: link state changed to UP Oct 2 16:49:24 mojito kernel: NVRM: Xid (0001:00): 16, Head Count 000105a4 Oct 2 16:49:26 mojito kernel: em0: watchdog timeout -- resetting Oct 2 16:49:26 mojito kernel: em0: link state changed to DOWN Oct 2 16:49:29 mojito kernel: em0: link state changed to UP Oct 2 16:49:32 mojito kernel: NVRM: Xid (0001:00): 16, Head Count 000105a5 Oct 2 16:49:36 mojito kernel: em0: watchdog timeout -- resetting Oct 2 16:49:36 mojito kernel: em0: link state changed to DOWN Oct 2 16:49:39 mojito kernel: em0: link state changed to UP Oct 2 16:49:47 mojito kernel: em0: watchdog timeout -- resetting Oct 2 16:49:47 mojito kernel: em0: link state changed to DOWN Oct 2 16:49:49 mojito kernel: em0: link state changed to UP and the box ends up frozen less than a minute later. The traffic on the Intel card can be low (pinging a host for a few dozen of seconds), medium (reloading a few pages in the tabs of Firefox) or high (downloading several iso images from our local FTP mirror): whatever I do, if both nvidia and em0 are used, the box freezes. Note that I can't freeze the box when doing several simultaneous big downloads or taring up a lot of files but NOT running X. So I guess it is a shared nvidia/em IRQ issue. FreeBSD 6.1-STABLE #0: Fri Jun 23 17:00:43 CEST 2006 had no such problem. The DEBUG kernconf is GENERIC + witness options enabled (but they do not help in this case). I traced back to find which changeset introduced the trouble. The results are: #*default release=cvs tag=RELENG_6 date=2006.06.23.17.00.00 # OK ... #*default release=cvs tag=RELENG_6 date=2006.08.08.09.12.56 # OK # #*default release=cvs tag=RELENG_6 date=2006.08.08.09.21.00 # BROKEN ... #*default release=cvs tag=RELENG_6 # BROKEN From sys commitlogs the culprit commits are: glebius 2006-08-08 09:19:25 utc freebsd src repository modified files: