On Feb 11, 2017, at 12:34 PM, Kenneth Bogert <[email protected]> wrote: > > Hello all, > > I have been running a Rockstor 3.8.16-8 on an older Dell Optiplex for about a > month. The system has four drives separated into two Raid1 filesystems > (“pools” in Rockstor terminology). A few days ago I restarted it and noticed > that the services (NFS, Samba, etc) weren’t working. Looking at dmesg, I saw: > > kernel: BTRFS error (device sdb): parent transid verify failed on > 1721409388544 wanted 19188 found 83121 > > and sure enough, one of the subvolumes on my main filesystem is corrupted. > By corrupted I mean it can’t be accessed, deleted, or even looked at: > > ls -l > kernel: BTRFS error (device sdb): parent transid verify failed on > 1721409388544 wanted 19188 found 83121 > kernel: BTRFS error (device sdb): parent transid verify failed on > 1721409388544 wanted 19188 found 83121 > ls: cannot access /mnt2/Primary/Movies: Input/output error > > total 16 > drwxr-xr-x 1 root root 100 Dec 29 02:00 . > drwxr-xr-x 1 root root 208 Jan 3 12:05 .. > drwxr-x--- 1 kbogert root 698 Feb 6 08:49 Documents > drwxr-xrwx 1 root root 916 Jan 3 12:54 Games > drwxr-xrwx 1 xenserver xenserver 2904 Jan 3 12:54 ISO > d????????? ? ? ? ? ? Movies > drwxr-xrwx 1 root root 139430 Jan 3 12:53 Music > drwxr-xrwx 1 root root 82470 Jan 3 12:53 RawPhotos > drwxr-xr-x 1 root root 80 Jan 1 04:00 .snapshots > drwxr-xrwx 1 root root 72 Jan 3 13:07 VMs > > The input/output error is given for any operation on Movies. > > Luckily there has been no data loss that I am aware of. As it turns out I > have a snapshot of the Movies subvolume taken a few days before the incident. > I was able to simply cp -a all files off of the entire filesystem, with no > reported errors, and verified a handful of them. Note that the transid error > in dmesg alternates between sdb and sda5 after each startup. > > > SETUP DETAILS > > uname -a > Linux ironmountain 4.8.7-1.el7.elrepo.x86_64 #1 SMP Thu Nov 10 20:47:24 EST > 2016 x86_64 x86_64 x86_64 GNU/Linux > > btrfs —version > btrfs-progs v4.8.3 > > btrfs dev scan > kernel: BTRFS: device label Primary devid 1 transid 83461 /dev/sdb > kernel: BTRFS: device label Primary devid 2 transid 83461 /dev/sda5 > > btrfs fi show /mnt2/Primary > Label: 'Primary' uuid: 21e09dd8-a54d-49ec-95cb-93fdd94f0c17 > Total devices 2 FS bytes used 943.67GiB > devid 1 size 2.73TiB used 947.06GiB path /dev/sdb > devid 2 size 2.70TiB used 947.06GiB path /dev/sda5 > > btrfs dev usage /mnt2/Primary > /dev/sda5, ID: 2 > Device size: 2.70TiB > Device slack: 0.00B > Data,RAID1: 944.00GiB > Metadata,RAID1: 3.00GiB > System,RAID1: 64.00MiB > Unallocated: 1.77TiB > > /dev/sdb, ID: 1 > Device size: 2.73TiB > Device slack: 0.00B > Data,RAID1: 944.00GiB > Metadata,RAID1: 3.00GiB > System,RAID1: 64.00MiB > Unallocated: 1.80TiB > > > btrfs fi df /mnt2/Primary > Data, RAID1: total=944.00GiB, used=942.60GiB > System, RAID1: total=64.00MiB, used=176.00KiB > Metadata, RAID1: total=3.00GiB, used=1.07GiB > GlobalReserve, single: total=512.00MiB, used=0.00B > > > This server is very light use, however, I do have a number of VMs in the VMs > filesystem, exported over NFS, that are used by a Xenserver. These are not > marked nocow, though I probably should have. At the time of restart no VMs > were running. > > I have deviated from Rockstor’s default setup a bit. They take an > “appliance” view and try to enforce btrfs partitions that cover entire disks. > I installed Rockstor onto /dev/sda4, created the Primary partition on > /dev/sdb using Rockstor’s gui, then on the command line added /dev/sda5 to it > and converted to raid1. As far as I can tell Rockstor is just CentOS 7 with > a few updated utilities and a bunch of python scripts for providing a web > interface to btrfs-progs. I have it setup to take monthly snapshots and do > monthly scrubs, with the exception of the Documents subvolume which takes > daily snapshots. These are all readonly and go in the .snapshots directory. > Rockstor automatically deletes old snapshots once a limit is reached (7 daily > snapshots, for instance). > > Side note, btrfs-progs 4.8.3 apparently has problems with CentOS 7’s glibc: > https://github.com/rockstor/rockstor-core/issues/1608 . I have confirmed > that bug in my own compiled version of 4.8.3, and that 4.9.1 does not have it. > > > WHAT I’VE TRIED AND RESULTS > > First off, I have created an image with btrfs-image that I can make available > (though large, I believe it was a few Gbs and the filesystem is 3 TB) > > * btrfs-zero-log > had no discernible effect. > > > * At this point, I compiled btrfs-progs 4.9.1. The following commands were > run with this version: > > > * btrfs check > This exits in an assert fairly quickly: > checking extents > cmds-check.c:5406: check_owner_ref: BUG_ON `rec->is_root` triggered, value 1 > /mnt/usb/btrfs-progs-bin/bin/btrfs[0x42139b] > /mnt/usb/btrfs-progs-bin/bin/btrfs[0x421483] > /mnt/usb/btrfs-progs-bin/bin/btrfs[0x430529] > /mnt/usb/btrfs-progs-bin/bin/btrfs[0x43160c] > /mnt/usb/btrfs-progs-bin/bin/btrfs[0x435d6f] > /mnt/usb/btrfs-progs-bin/bin/btrfs[0x43ab71] > /mnt/usb/btrfs-progs-bin/bin/btrfs[0x43b065] > /mnt/usb/btrfs-progs-bin/bin/btrfs(cmd_check+0xbbc)[0x441b82] > /mnt/usb/btrfs-progs-bin/bin/btrfs(main+0x12b)[0x40a734] > /lib64/libc.so.6(__libc_start_main+0xf5)[0x7ffff6fa7b35] > /mnt/usb/btrfs-progs-bin/bin/btrfs[0x40a179] > > Full backtrace is attached as btrfsck_debug.log > > * btrfs check -mode lowmem > This outputs a large number of errors before finally segfault’ing. > Full backtrace attached as btrfsck_lowmem_debug.log > > * btrfs scrub > This completes with no errors. > > > * Memtest86 completed more than 6 passes with no errors (left it running for > a day) > > * No SMART errors, btrfs device stats shows no errors. The drives the > filesystem is on are brand new. > > * I have tried to recreate the problem by installing Rockstor into a number > of VMs and redoing my steps, no such luck. > > > The main Rockstor partition (btrfs), as well as the other Raid1 partition on > completely separate drives were not affected. I can provide any other logs > requested. > > Help would be greatly appreciated! > > > Kenneth Bogert > > <btrfsck_lowmem_debug.log><btrfsck_debug.log>
As a small update to this problem, here is the output of btrfs subvolume list (with 4.9.1): The snapshot for the Movies subvolume is at gen 73808 but Movies is 19188? ID 259 gen 83464 cgen 39 parent 5 top level 5 parent_uuid - path Music ID 260 gen 19188 cgen 40 parent 5 top level 5 parent_uuid - path Movies ID 261 gen 73808 cgen 41 parent 5 top level 5 parent_uuid - path ISO ID 262 gen 73864 cgen 42 parent 5 top level 5 parent_uuid - path RawPhotos ID 263 gen 83456 cgen 44 parent 5 top level 5 parent_uuid - path VMs ID 601 gen 73810 cgen 356 parent 5 top level 5 parent_uuid - path Games ID 882 gen 83462 cgen 526 parent 5 top level 5 parent_uuid - path Documents ID 2104 gen 44513 cgen 44513 parent 5 top level 5 parent_uuid 212f71b3-21a2-274c-b080-86f262f50ccb path .snapshots/Documents/documents_daily_1 ID 2111 gen 55190 cgen 55190 parent 5 top level 5 parent_uuid 212f71b3-21a2-274c-b080-86f262f50ccb path .snapshots/Documents/documents_weekly_201701220542 ID 2121 gen 68569 cgen 68569 parent 5 top level 5 parent_uuid 212f71b3-21a2-274c-b080-86f262f50ccb path .snapshots/Documents/documents_weekly_201701290542 ID 2122 gen 68593 cgen 68593 parent 5 top level 5 parent_uuid 4e131f43-6ccb-7449-89ed-0d00b761cb08 path .snapshots/VMs/VMs_201701290600 ID 2124 gen 71873 cgen 71873 parent 5 top level 5 parent_uuid 212f71b3-21a2-274c-b080-86f262f50ccb path .snapshots/Documents/documents_daily_201701310400 ID 2125 gen 73705 cgen 73705 parent 5 top level 5 parent_uuid 212f71b3-21a2-274c-b080-86f262f50ccb path .snapshots/Documents/documents_daily_201702010400 ID 2126 gen 73808 cgen 73808 parent 5 top level 5 parent_uuid 1d82b662-f291-b340-9424-804fa431a03b path .snapshots/ISO/ISO_201702010500 ID 2127 gen 73808 cgen 73808 parent 5 top level 5 parent_uuid 915e8022-4cf3-084b-8ac6-504822a168c4 path .snapshots/Movies/movies_201702010500 ID 2128 gen 73810 cgen 73810 parent 5 top level 5 parent_uuid adcb63c8-ee55-8b49-8f7a-aed491aab7e6 path .snapshots/Games/games_201702010500 ID 2129 gen 73811 cgen 73811 parent 5 top level 5 parent_uuid e23f7432-fc89-c849-a2f2-4280cefabcf7 path .snapshots/Music/music_201702010500 ID 2130 gen 73864 cgen 73864 parent 5 top level 5 parent_uuid 67dc081c-cf8e-a444-8c8f-7899865e2f08 path .snapshots/RawPhotos/rawphotos_201702010530 ID 2131 gen 73865 cgen 73865 parent 5 top level 5 parent_uuid 212f71b3-21a2-274c-b080-86f262f50ccb path .snapshots/Documents/documents_monthly_201702010530 ID 2132 gen 73920 cgen 73920 parent 5 top level 5 parent_uuid 4e131f43-6ccb-7449-89ed-0d00b761cb08 path .snapshots/VMs/VMs_201702010600 ID 2133 gen 75516 cgen 75516 parent 5 top level 5 parent_uuid 212f71b3-21a2-274c-b080-86f262f50ccb path .snapshots/Documents/documents_daily_201702020400 ID 2134 gen 77397 cgen 77397 parent 5 top level 5 parent_uuid 212f71b3-21a2-274c-b080-86f262f50ccb path .snapshots/Documents/documents_daily_201702030400 ID 2135 gen 79229 cgen 79229 parent 5 top level 5 parent_uuid 212f71b3-21a2-274c-b080-86f262f50ccb path .snapshots/Documents/documents_daily_201702040400 ID 2136 gen 81109 cgen 81109 parent 5 top level 5 parent_uuid 212f71b3-21a2-274c-b080-86f262f50ccb path .snapshots/Documents/documents_daily_201702050400 ID 2137 gen 81246 cgen 81246 parent 5 top level 5 parent_uuid 212f71b3-21a2-274c-b080-86f262f50ccb path .snapshots/Documents/documents_weekly_201702050542 ID 2138 gen 81273 cgen 81273 parent 5 top level 5 parent_uuid 4e131f43-6ccb-7449-89ed-0d00b761cb08 path .snapshots/VMs/VMs_201702050600 ID 2139 gen 82966 cgen 82966 parent 5 top level 5 parent_uuid 212f71b3-21a2-274c-b080-86f262f50ccb path .snapshots/Documents/documents_daily_201702060400 Kenneth Bogert -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
