Re: dump -X of large LVM based FFSv2 with WAPBL panics

Matthias Petermann Fri, 17 Nov 2017 22:17:38 -0800

Hello Jaromir,

actually I did a forced fsck on the respective FS while it was unmountedupfront. To be sure I just ran the command again - it passes with noerrors the second time. When I run dump -X again, the panic still occurs.


Best regards,
Matthias


nuc# fsck -P /dev/mapper/vg0-photo
** /dev/mapper/rvg0-photo
** File system is clean; not checking
nuc# fsck -P -f /dev/mapper/vg0-photo
** /dev/mapper/rvg0-photo
** File system is already clean
** Last Mounted on /p
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups

FREE BLK COUNT(S) WRONG INSUPERBLK**********************************************************************************************************************************| 97%

SALVAGE? [yn] y

59411 files, 63408414 used, 35694535 free (2079 frags, 4461557 blocks,0.0% fragmentation)


***** FILE SYSTEM WAS MODIFIED *****
nuc# fsck -P -f /dev/mapper/vg0-photo
** /dev/mapper/rvg0-photo
** File system is already clean
** Last Mounted on /p
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups

59411 files, 63408414 used, 35694535 free (2079 frags, 4461557 blocks,0.0% fragmentation)

nuc# mount /p
nuc# touch /p/test.ignore
nuc# umount /p
nuc# fsck -P -f /dev/mapper/vg0-photo
** /dev/mapper/rvg0-photo
** File system is already clean
** Last Mounted on /p
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups

59412 files, 63408414 used, 35694535 free (2079 frags, 4461557 blocks,0.0% fragmentation)

nuc#

Am 15.11.2017 um 20:29 schrieb Jaromír Doleček:

Hi,

can you try if doing full forced fsck (fsck -f) would resolve this?

I've seen several such persistent panics when I was debugging WAPBL.Even after kernel fixes I had persistent panics around ffs_newvnode()due to disk data corruption from previous runs. This is worth trying.

Some day I plan to add some counter, so that actually boot wouldactually force fsck every X boots even when clean, similarily what Linuxdoes with ext3/4.


Jaromir

2017-11-15 12:56 GMT+01:00 Matthias Petermann <[email protected]<mailto:[email protected]>>:


    Hello,

    on my system I have observed a serious panic when doing FFSv2 dumps
    under certain conditions. I did some googling on my own and found
    some references regarding the lead symptom

             "ffs_newvnode: ino=113 on /p: gen 55fd2f1f/55fd2f1f has non
    zero blocks ffffffffffffff00 or size 0"

    but all of them ended up as solved back in 2016. So I wanted to
    share my observation here, in the hope somebody can give me some
    pointers how the issue could be narrowed down further.

    1) Given:

    - NetBSD 8.0_BETA (Kernel built from branches/netbsd-8 around
    2017-11-06)

             NetBSD nuc.local 8.0_BETA NetBSD 8.0_BETA (XEN3_DOM0_XHCI)
    #0: Mon Nov 6 14:31:17 CET 2017
    [email protected]:/s/src/sys/arch/amd64/compile/XEN3_DOM0_XHCI amd64

    - A large (392 GB) LVM volume hosting a FFSv2 filesystem with WAPBL
    enabled
       (/dev/mapper/vg0-photo mounted at /p)

    - (An external USB 3.0 Drive)

    2) What I tried:

    - make a dump of the aforementioned filesystem, using snapshots

         # dump -X -0auf /mnt/photo.0.dump /p

    3) What happens then:

    - the System crashes, leaving a coredump with with the following
    indication:

         ffs_newvnode: ino=113 on /p: gen 55fd2f1f/55fd2f1f has non zero
    blocks ffffffffffffff00 or size 0
         fatal page fault in supervisor mode
         trap type 6 code 0x2 rip 0xffffffff8022c0cc cs 0x8 rflags
    0x10246 cr2 0xfffffe82deaddf1d ilevel 0x3 rsp 0xfffffe810e6b1eb8
         curlwp 0xfffffe827f736000 pid 0.4 lowest kstack 0xfffffe810e6ae2c0
         panic: trap
         cpu0: Begin traceback...
         vpanic() at netbsd:vpanic+0x140
         snprintf() at netbsd:snprintf
         trap() at netbsd:trap+0xc6b
         --- trap (number 6) ---
         mutex_enter() at netbsd:mutex_enter+0xc
         biodone2() at netbsd:biodone2+0x9b
         biodone2() at netbsd:biodone2+0x9b
         biointr() at netbsd:biointr+0x3a
         softint_dispatch() at netbsd:softint_dispatch+0xd3
         DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xfffffe810e6b1ff0
         Xsoftintr() at netbsd:Xsoftintr+0x4f
         --- interrupt ---
         0:
         cpu0: End traceback...

         dumping to dev 0,1 (offset=168119, size=2076255):
         dump

    - gdb backtrace shows:

         (gdb) target kvm netbsd.3.core
         0xffffffff80229545 in cpu_reboot ()
         (gdb) bt
         #0  0xffffffff80229545 in cpu_reboot ()
         #1  0xffffffff809a4afc in vpanic ()
         #2  0xffffffff809a4bb0 in panic ()
         #3  0xffffffff8022b176 in trap ()
         #4  0xffffffff8020113e in alltraps ()
         #5  0xffffffff8022c0cc in mutex_enter ()
         #6  0xffffffff80a029f5 in wapbl_biodone ()
         #7  0xffffffff809e2f20 in biodone2 ()
         #8  0xffffffff809e2f20 in biodone2 ()
         #9  0xffffffff809e303e in biointr ()
         #10 0xffffffff8097bc1d in softint_dispatch ()
         #11 0xffffffff80223eef in Xsoftintr ()
         (gdb)

    4) What I tried afterwards:

    - make a dump of the aforementioned filesystem, using NO snapshots

         # dump -0auf /mnt/photo.0.dump /p

         -> works

    - umount the filesystem, enforcing a manual fsck

         -> no problems

    - dumpfs -s /dev/mapper/vg0-photo

         nuc# dumpfs -s /dev/mapper/vg0-photo
         file system: /dev/mapper/vg0-photo
         format  FFSv2
         endian  little-endian
         location 65536  (-b 128)
         magic   19540119        time    Wed Nov 15 12:26:52 2017
         superblock location     65536   id      [ 59f8026a 16319237 ]
         cylgrp  dynamic inodes  FFSv2   sblock  FFSv2   fslevel 5

nbfree 4461561 ndir 1865 nifree 24770027 nffree2079

         ncg     530     size    100663296       blocks  99102949
         bsize   32768   shift   15      mask    0xffff8000
         fsize   4096    shift   12      mask    0xfffff000
         frag    8       shift   3       fsbtodb 3
         bpg     23742   fpg     189936  ipg     46848
         minfree 5%      optim   time    maxcontig 2     maxbpg  4096
         symlinklen 120  contigsumsize 2
         maxfilesize 0x000800800805ffff
         nindir  4096    inopb   128
         avgfilesize 16384       avgfpdir 64
         sblkno  24      cblkno  32      iblkno  40      dblkno  2968
         sbsize  4096    cgsize  32768
         csaddr  2968    cssize  12288
         cgrotor 0       fmod    0       ronly   0       clean   0x01
         wapbl version 0x1       location 2      flags 0x0
         wapbl loc0 402688128    loc1 131072     loc2 512        loc3 3
         flags   none
         fsmnt   /p
         volname         swuid   0

    5) Further observations:

    - dump -X of other FSs on the same machine seem to work fine, but
       these FSs are smaller

    I'd be glad to help identifying the root cause further.

    Best regards,
    Matthias

--Matthias Petermann <[email protected]

    <mailto:[email protected]>> | www.petermann-it.de
    <http://www.petermann-it.de>
    GnuPG: 0x5C3E6D75 | 5930 86EF 7965 2BBA 6572  C3D7 7B1D A3C3 5C3E 6D75


--
Matthias Petermann <[email protected]> | www.petermann-it.de
GnuPG: 0x5C3E6D75 | 5930 86EF 7965 2BBA 6572  C3D7 7B1D A3C3 5C3E 6D75

Re: dump -X of large LVM based FFSv2 with WAPBL panics

Reply via email to