Hello,

My raid1 btrfs fs went read only recently. It was comprised of 2 drives:
/dev/sda ST4000VN008 (firmware SC60) - 6 month old drive
/dev/sdb ST4000VN000 (firmware SC44) - 5 year old drive (but it was
mostly idly spinning, very little accesses were done in that time)
The drives are pretty similar (size/performance/market segment/rpm),
but they're of different generations.

FWIW kernel is v5.11.2 (https://archlinux.org/packages/core/x86_64/linux/)

I noticed something was wrong when the filesystem was read only. Dmesg
showed a single error about 50 min previous:
> Mar 04 19:04:13  kernel: BTRFS critical (device sda3): corrupt leaf: 
> block=4664769363968 slot=17 extent bytenr=4706905751552 len=8192 invalid 
> extent refs, have 1 expect >= inline 129
> Mar 04 19:04:13  kernel: BTRFS info (device sda3): leaf 4664769363968 gen 
> 1143228 total ptrs 112 free space 6300 owner 2
> Mar 04 19:04:14  kernel:         item 0 key (4706904485888 168 8192) itemoff 
> 16230 itemsize 53
> Mar 04 19:04:14  kernel:                 extent refs 1 gen 1123380 flags 1
> Mar 04 19:04:14  kernel:                 ref#0: extent data backref root 431 
> objectid 923767 offset 175349760 count 1
No other ATA errors nearby, there wasn't much activity going on around
there either.

I tried to remount everything using the fstab, but it wasn't too happy:
> ~% sudo mount -a
> mount: /mnt/fs: wrong fs type, bad option, bad superblock on /dev/sdb3, 
> missing codepage or helper program, or other error.
I regret not checking dmesg after that command, that was stupid of me
(though I do have dmesg output of this later on).

Catting /dev/sda seemed just fine, so at least one could still read
from the supposedly bad drive. I also think that the error message
just above always lists a random (per boot) drive of the array, not
necessarily the one that causes problems, which scares me for a second
there.

The next "bright" idea I had was maybe this was a small bad block on
/dev/sda and what are the chances that the array will try to write
again to that spot. Maybe the next reboot will be fine. So I just
rebooted.

The system didn't come back up anymore (and so did my 3000 mile ssh
access that was dear to me). SInce my rootfs was on that array I was
dumped to an initrd shell.
Any attempts to mount were met with more scary superblock errors (even
if i tried /dev/sdb)

This time I checked dmesg:
> BTRFS info (device sda3): disk space caching is enabled
> BTRFS info (device sda3): has skinny extents
> BTRFS info (device sda3): start tree-log replay
> BTRFS error (device sda3): parent transid verify failed on 4664769363968 
> wanted 1143228 found 1143173
> BTRFS error (device sda3): parent transid verify failed on 4664769363968 
> wanted 1143228 found 1143173
> BTRFS: error (device sda3) in btrfs_free_extent:3103 errno-5 IO failure
> BTRFS: error (device sda3) in btrfs_run_delayed_refs:2171: errno=-5 IO failure
> BTRFS warning (device sda3): Skipping commit of aborted transaction.
> BTRFS: error (device sda3) in cleanup_transaction:1938: errno-5 10 failure
> BTRFS: error (device sda3) in btrfs_replay_log:2254: errno-5 I0 failure 
> (Failed to recover log tree)
> BTRFS error (device sda3): open_ctree failed
A fuller log (but not OCRd) can be found at
https://lh3.googleusercontent.com/-aV23XURv_f0/YEGLDeEavbI/AAAAAAAALYI/bFuSQsTYbCM7-z9SSNbcZq-7p1I7wGyLQCK8BGAsYHg/s0/2021-03-04.jpg,
though please excuse the format, I have to debug/fix this over VC.

I managed to successfully mount by doing `mount -o
degraded,ro,norecovery,subvol=/root /new_root`. Seems to work fine for
RO access.

I can't really boot anything from this though, systemd refuses to go
past what the fstab dictates and without either a root password for
the emergency shell (which i don't evne have) or being able to change
the fstab (which I don't think I am capable of getting right in that
one RW attempt).

I used a chroot in that RO mount to start a long smart scan of both
drives. I guess I'll find results in a couple of hours.

In the meantime I ordered another ST4000VN008 drive for more room for
activities, maybe I can do a `btrfs replace` if needed.

I was earlier on irc/#btrfs, Zygo mentioned that these (at least the
later transid verify errors) are very strange and are either drive
firmware, ram or kernel bugs. Hoping this brings a fuller picture.
Ram might be a little suspect, it's a newish machine I built, but I
have run memtest86 on it for 12 hours with no problems. No ECC though.

My questions:
* If both my drives' smart run report no errors, how do I recover my
array? Ideally I would do this inplace.
    * Any suggestions how to use my new third drive to make things safer?
* I would be ok with doing a 3 device raid1 in the future, would that
protect me from something similar while not degrading to RO?

When this is all over I'm setting up my daily btrbk remote snapshot
that I've been putting off for an extra piece of mind (then I'll have
my data copied on 5 drives in total).

Thanks,
Alexandru Stan

Reply via email to