On Fri, 5 Mar 2021 at 05:38, Anand Jain <anand.j...@oracle.com> wrote: > > On 05/03/2021 15:15, Alexandru Stan wrote: > > Hello, > > > > My raid1 btrfs fs went read only recently. It was comprised of 2 drives: > > /dev/sda ST4000VN008 (firmware SC60) - 6 month old drive > > /dev/sdb ST4000VN000 (firmware SC44) - 5 year old drive (but it was > > mostly idly spinning, very little accesses were done in that time) > > The drives are pretty similar (size/performance/market segment/rpm), > > but they're of different generations. > > > > FWIW kernel is v5.11.2 (https://archlinux.org/packages/core/x86_64/linux/) > > > > I noticed something was wrong when the filesystem was read only. Dmesg > > showed a single error about 50 min previous: > >> Mar 04 19:04:13 kernel: BTRFS critical (device sda3): corrupt leaf: > >> block=4664769363968 slot=17 extent bytenr=4706905751552 len=8192 invalid > >> extent refs, have 1 expect >= inline 129 > >> Mar 04 19:04:13 kernel: BTRFS info (device sda3): leaf 4664769363968 gen > >> 1143228 total ptrs 112 free space 6300 owner 2 > >> Mar 04 19:04:14 kernel: item 0 key (4706904485888 168 8192) > >> itemoff 16230 itemsize 53 > >> Mar 04 19:04:14 kernel: extent refs 1 gen 1123380 flags 1 > >> Mar 04 19:04:14 kernel: ref#0: extent data backref root > >> 431 objectid 923767 offset 175349760 count 1 > > No other ATA errors nearby, there wasn't much activity going on around > > there either. > > > > I tried to remount everything using the fstab, but it wasn't too happy: > >> ~% sudo mount -a > >> mount: /mnt/fs: wrong fs type, bad option, bad superblock on /dev/sdb3, > >> missing codepage or helper program, or other error. > > I regret not checking dmesg after that command, that was stupid of me > > (though I do have dmesg output of this later on). > > > > Catting /dev/sda seemed just fine, so at least one could still read > > from the supposedly bad drive. I also think that the error message > > just above always lists a random (per boot) drive of the array, not > > necessarily the one that causes problems, which scares me for a second > > there. > > > > The next "bright" idea I had was maybe this was a small bad block on > > /dev/sda and what are the chances that the array will try to write > > again to that spot. Maybe the next reboot will be fine. So I just > > rebooted. > > > > The system didn't come back up anymore (and so did my 3000 mile ssh > > access that was dear to me). SInce my rootfs was on that array I was > > dumped to an initrd shell. > > Any attempts to mount were met with more scary superblock errors (even > > if i tried /dev/sdb) > > > > > > > This time I checked dmesg: > >> BTRFS info (device sda3): disk space caching is enabled > >> BTRFS info (device sda3): has skinny extents > >> BTRFS info (device sda3): start tree-log replay > >> BTRFS error (device sda3): parent transid verify failed on 4664769363968 > >> wanted 1143228 found 1143173 > >> BTRFS error (device sda3): parent transid verify failed on 4664769363968 > >> wanted 1143228 found 1143173 > >> BTRFS: error (device sda3) in btrfs_free_extent:3103 errno-5 IO failure > >> BTRFS: error (device sda3) in btrfs_run_delayed_refs:2171: errno=-5 IO > >> failure > >> BTRFS warning (device sda3): Skipping commit of aborted transaction. > >> BTRFS: error (device sda3) in cleanup_transaction:1938: errno-5 10 failure > >> BTRFS: error (device sda3) in btrfs_replay_log:2254: errno-5 I0 failure > >> (Failed to recover log tree) > >> BTRFS error (device sda3): open_ctree failed > > A fuller log (but not OCRd) can be found at > > https://lh3.googleusercontent.com/-aV23XURv_f0/YEGLDeEavbI/AAAAAAAALYI/bFuSQsTYbCM7-z9SSNbcZq-7p1I7wGyLQCK8BGAsYHg/s0/2021-03-04.jpg, > > though please excuse the format, I have to debug/fix this over VC. > > > > I managed to successfully mount by doing `mount -o > > degraded,ro,norecovery,subvol=/root /new_root`. Seems to work fine for > > RO access. > > > > > From the parent transid verify failed it looks like a disk did not > receive few writes. A complete dmesg log will be better to understand > the root cause.
My dmesg only prints at most that snippet every time I try to mount the fs. Is there any other debugging wanted that I should enable for this? I doubt there's any way to get the original dmesg (besides that first "corrupt leaf" snippet I posted) before the reboot, I assume it didn't have a chance to write those logs to the rootfs since it went RO. > > Thanks. > > > I can't really boot anything from this though, systemd refuses to go > > past what the fstab dictates and without either a root password for > > the emergency shell (which i don't evne have) or being able to change > > the fstab (which I don't think I am capable of getting right in that > > one RW attempt). > > > > I used a chroot in that RO mount to start a long smart scan of both > > drives. I guess I'll find results in a couple of hours. Long smart scan completed with no errors on both drives. > > > > In the meantime I ordered another ST4000VN008 drive for more room for > > activities, maybe I can do a `btrfs replace` if needed. > > > > I was earlier on irc/#btrfs, Zygo mentioned that these (at least the > > later transid verify errors) are very strange and are either drive > > firmware, ram or kernel bugs. Hoping this brings a fuller picture. > > Ram might be a little suspect, it's a newish machine I built, but I > > have run memtest86 on it for 12 hours with no problems. No ECC though. > > > > My questions: > > * If both my drives' smart run report no errors, how do I recover my > > array? Ideally I would do this inplace. > > * Any suggestions how to use my new third drive to make things safer? > > * I would be ok with doing a 3 device raid1 in the future, would that > > protect me from something similar while not degrading to RO? > > > > When this is all over I'm setting up my daily btrbk remote snapshot > > that I've been putting off for an extra piece of mind (then I'll have > > my data copied on 5 drives in total). > > > > Thanks, > > Alexandru Stan > > > Alexandru Stan