On Fri, 12-Jul-2013 at 08:35:33 +0200, Konstantin Belousov wrote:
> On Fri, Jul 12, 2013 at 08:05:27AM +0200, Andre Albsmeier wrote:
> > On Fri, 12-Jul-2013 at 08:01:12 +0200, Konstantin Belousov wrote:
> > > On Fri, Jul 12, 2013 at 07:24:40AM +0200, Andre Albsmeier wrote:
> > > > On Thu, 04-Jul-2013 at 19:25:28 +0200, Konstantin Belousov wrote:
> > > > > On Thu, Jul 04, 2013 at 04:29:19PM +0200, Andre Albsmeier wrote:
> > > > > > OK, patch is applied. I will reboot the machine later
> > > > > > and see what happens tomorrow in the morning. However,
> > > > > > it might take a few days since the last 2 weeks all was
> > > > > > fine.
> > > > > > 
> > > > > > BTW, should this patch be used in general or is it just
> > > > > > for debugging? My understanding is that it is something
> > > > > > which could stay in the code...
> > > > > 
> > > > > Patch is to improve debugging.
> > > > > 
> > > > > I probably commit it after the issue is closed.  Arguments against
> > > > > the commit is that the change imposes small performance penalty
> > > > > due to save and restore of the %ebp (I doubt that this is measureable
> > > > > by any means).  Also, arguably, such change should be done for all
> > > > > functions in support.s, but bcopy() is the hot spot.
> > > > 
> > > > Got a new one, 2 hours old ;-)
> > > > 
> > > > GNU gdb 6.1.1 [FreeBSD]
> > > > Copyright 2004 Free Software Foundation, Inc.
> > > > GDB is free software, covered by the GNU General Public License, and 
> > > > you are
> > > > welcome to change it and/or distribute copies of it under certain 
> > > > conditions.
> > > > Type "show copying" to see the conditions.
> > > > There is absolutely no warranty for GDB.  Type "show warranty" for 
> > > > details.
> > > > This GDB was configured as "i386-marcel-freebsd"...
> > > > 
> > > > Unread portion of the kernel message buffer:
> > > > 
> > > > 
> > > > Fatal trap 12: page fault while in kernel mode
> > > > fault virtual address   = 0xcd5ec000
> > > > fault code              = supervisor write, page not present
> > > > instruction pointer     = 0x20:0xc07cb2fe
> > > > stack pointer           = 0x28:0xd82e45cc
> > > > frame pointer           = 0x28:0xd82e45d4
> > > > code segment            = base 0x0, limit 0xfffff, type 0x1b
> > > >                         = DPL 0, pres 1, def32 1, gran 1
> > > > processor eflags        = interrupt enabled, resume, IOPL = 0
> > > > current process         = 18714 (mksnap_ffs)
> > > > trap number             = 12
> > > > panic: page fault
> > > > KDB: stack backtrace:
> > > > db_trace_self_wrapper(c08207eb,d82e4418,c05fdfc9,c081df13,c08a82e0,...) 
> > > > at db_trace_self_wrapper+0x26/frame 0xd82e43e8
> > > > kdb_backtrace(c081df13,c08a82e0,c0801bfa,d82e4424,d82e4424,...) at 
> > > > kdb_backtrace+0x29/frame 0xd82e43f4
> > > > panic(c0801bfa,c0845a01,c2b067d4,1,1,...) at panic+0xc9/frame 0xd82e4418
> > > > trap_fatal(c0ff6000,cd5ec000,2,0,c08b6bf4,...) at 
> > > > trap_fatal+0x353/frame 0xd82e4458
> > > > trap_pfault(baa8454b,21510,0,c2b06620,c08b6bf0,...) at 
> > > > trap_pfault+0x2d7/frame 0xd82e44a0
> > > > trap(d82e458c) at trap+0x41a/frame 0xd82e4580
> > > > calltrap() at calltrap+0x6/frame 0xd82e4580
> > > > --- trap 0xc, eip = 0xc07cb2fe, esp = 0xd82e45cc, ebp = 0xd82e45d4 ---
> > > > bcopy(c36ed000,cd5e6000,8000,8000,c281b980,...) at bcopy+0x1a/frame 
> > > > 0xd82e45d4
> > > > ffs_snapshot(c2b35a90,c2ed0400,0,0,0,...) at ffs_snapshot+0x2933/frame 
> > > > 0xd82e490c
> > > > ffs_mount(c2b35a90,c322e200,ff,d82e4c08,c2ccbc8c,...) at 
> > > > ffs_mount+0x15ee/frame 0xd82e4a3c
> > > > vfs_donmount(c2b06620,10313108,0,c2b74d80,c2b74d80,...) at 
> > > > vfs_donmount+0x196b/frame 0xd82e4c2c
> > > > sys_nmount(c2b06620,d82e4ccc,c2b06908,d82e4c6c,c0605015,...) at 
> > > > sys_nmount+0x63/frame 0xd82e4c50
> > > > syscall(d82e4d08) at syscall+0x2ce/frame 0xd82e4cfc
> > > > Xint0x80_syscall() at Xint0x80_syscall+0x21/frame 0xd82e4cfc
> > > > --- syscall (378, FreeBSD ELF32, sys_nmount), eip = 0x180bdf37, esp = 
> > > > 0xbfbfd65c, ebp = 0xbfbfddd8 ---
> > > > Uptime: 4d20h0m44s
> > > > Physical memory: 503 MB
> > > > Dumping 104 MB: 89 73 57 41 25 9
> > > > 
> > > > No symbol "stopped_cpus" in current context.
> > > > No symbol "stoppcbs" in current context.
> > > > #0  doadump (textdump=1) at pcpu.h:249
> > > > 249     pcpu.h: No such file or directory.
> > > >         in pcpu.h
> > > > (kgdb) where
> > > > #0  doadump (textdump=1) at pcpu.h:249
> > > > #1  0xc05fdddd in kern_reboot (howto=260) at 
> > > > /src/src-9/sys/kern/kern_shutdown.c:449
> > > > #2  0xc05fe028 in panic (fmt=<value optimized out>) at 
> > > > /src/src-9/sys/kern/kern_shutdown.c:637
> > > > #3  0xc07cd1d3 in trap_fatal (frame=0xd82e458c, eva=3445538816)
> > > >     at /src/src-9/sys/i386/i386/trap.c:1044
> > > > #4  0xc07cd4b7 in trap_pfault (frame=0xd82e458c, usermode=0, 
> > > > eva=3445538816)
> > > >     at /src/src-9/sys/i386/i386/trap.c:957
> > > > #5  0xc07ce05a in trap (frame=0xd82e458c) at 
> > > > /src/src-9/sys/i386/i386/trap.c:555
> > > > #6  0xc07ba88c in calltrap () at 
> > > > /src/src-9/sys/i386/i386/exception.s:170
> > > > #7  0xc07cb2fe in bcopy () at /src/src-9/sys/i386/i386/support.s:198
> > > > #8  0xc072be13 in ffs_snapshot (mp=0xc2b35a90, snapfile=0xc2ed0400 
> > > > "s5-2013.07.12-03.15.01")
> > > >     at /src/src-9/sys/ufs/ffs/ffs_snapshot.c:793
> > > > #9  0xc0748e8e in ffs_mount (mp=0xc2b35a90) at 
> > > > /src/src-9/sys/ufs/ffs/ffs_vfsops.c:483
> > > > #10 0xc068a72b in vfs_donmount (td=0xc2b06620, fsflags=271659272, 
> > > > fsoptions=0xc2b74d80)
> > > >     at /src/src-9/sys/kern/vfs_mount.c:948
> > > > #11 0xc068a8e3 in sys_nmount (td=0xc2b06620, uap=0xd82e4ccc) at 
> > > > /src/src-9/sys/kern/vfs_mount.c:417
> > > > #12 0xc07cd7ae in syscall (frame=0xd82e4d08) at subr_syscall.c:135
> > > > #13 0xc07ba8f1 in Xint0x80_syscall () at 
> > > > /src/src-9/sys/i386/i386/exception.s:270
> > > > #14 0x00000033 in ?? ()
> > > > Previous frame inner to this frame (corrupt stack?)
> > > 
> > > Please show me the first 100 lines of the output of dumpfs(8) on the
> > > filesystem where snapshot creation caused the panic.
> > 
> > OK, dumpfs /dev/stripe/p | head -100:
> > 
> > magic       11954 (UFS1)    time    Fri Jul 12 08:02:40 2013
> > id  [ 517fa356 4ecc9335 ]
> > ncg 82      size    17774144        blocks  17737399
> > bsize       32768   shift   15      mask    0xffff8000
> > fsize       4096    shift   12      mask    0xfffff000
> > frag        8       shift   3       fsbtodb 3
> > minfree     8%      optim   time    symlinklen 60
> > maxbpg      4096    maxcontig 4     contigsumsize 4
> > nbfree      1958555 ndir    695     nifree  1123668 nffree  5395
> > cpg 1       bpg     27415   fpg     219320  ipg     13824
> > nindir      8192    inopb   256     nspf    8       maxfilesize     
> > 18016597801566207
> > sbsize      4096    cgsize  32768   cgoffset 0      cgmask  0xffffffff
> > csaddr      456     cssize  4096
> > rotdelay 0ms        rps     60      trackskew 0     interleave 1
> > nsect       1754560 npsect  1754560 spc     1754560
> > sblkno      8       cblkno  16      iblkno  24      dblkno  456
> > cgrotor     50      fmod    0       ronly   0       clean   0
> > metaspace 0 avgfpdir 64     avgfilesize 16384
> > flags       soft-updates 
> > fsmnt       /palveli
> > volname             swuid   0       providersize    17774144
> 
> UFS1, weird.

Hmm, why? I like UFS1 on my old and good (but small)
SCSI disks as long as I do not use ACLs or similar stuff.

> 
> I believe I see the problem.  UFS1 superblock is not aligned on the
> fs block boundary, and bcopy() call tried to do the full block copy.
> In fact, when the snapshotting operation did not trap, you probably
> get a data corruption in the unrelated buffer.

OK, and this could probably explain why I saw a panic
when running a "dump -L /" on another machine which
also got a UFS1.

> 
> Please try the patch below.

Patch applied, thanks. I also ran a snapshot operation which
succeeded. Now let's see what the future shows...

I'll see if I can retry the "dump -L" on my several other
UFS1 boxes in the next days...

Thanks,

        -Andre

> 
> diff --git a/sys/ufs/ffs/ffs_snapshot.c b/sys/ufs/ffs/ffs_snapshot.c
> index ad157aa..c37706b 100644
> --- a/sys/ufs/ffs/ffs_snapshot.c
> +++ b/sys/ufs/ffs/ffs_snapshot.c
> @@ -792,7 +792,7 @@ out1:
>               brelse(nbp);
>       } else {
>               loc = blkoff(fs, fs->fs_sblockloc);
> -             bcopy((char *)copy_fs, &nbp->b_data[loc], fs->fs_bsize);
> +             bcopy((char *)copy_fs, &nbp->b_data[loc], (u_int)fs->fs_sbsize);
>               bawrite(nbp);
>       }
>       /*



-- 
Unix is very userfriendly. It's just picky who its friends are.
_______________________________________________
[email protected] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[email protected]"

Reply via email to