Re: Scared with degraded raid1 fs

Alexandru Stan Fri, 05 Mar 2021 10:05:03 -0800

On Fri, 5 Mar 2021 at 05:38, Anand Jain <anand.j...@oracle.com> wrote:
>
> On 05/03/2021 15:15, Alexandru Stan wrote:
> > Hello,
> >
> > My raid1 btrfs fs went read only recently. It was comprised of 2 drives:
> > /dev/sda ST4000VN008 (firmware SC60) - 6 month old drive
> > /dev/sdb ST4000VN000 (firmware SC44) - 5 year old drive (but it was
> > mostly idly spinning, very little accesses were done in that time)
> > The drives are pretty similar (size/performance/market segment/rpm),
> > but they're of different generations.
> >
> > FWIW kernel is v5.11.2 (https://archlinux.org/packages/core/x86_64/linux/)
> >
> > I noticed something was wrong when the filesystem was read only. Dmesg
> > showed a single error about 50 min previous:
> >> Mar 04 19:04:13  kernel: BTRFS critical (device sda3): corrupt leaf: 
> >> block=4664769363968 slot=17 extent bytenr=4706905751552 len=8192 invalid 
> >> extent refs, have 1 expect >= inline 129
> >> Mar 04 19:04:13  kernel: BTRFS info (device sda3): leaf 4664769363968 gen 
> >> 1143228 total ptrs 112 free space 6300 owner 2
> >> Mar 04 19:04:14  kernel:         item 0 key (4706904485888 168 8192) 
> >> itemoff 16230 itemsize 53
> >> Mar 04 19:04:14  kernel:                 extent refs 1 gen 1123380 flags 1
> >> Mar 04 19:04:14  kernel:                 ref#0: extent data backref root 
> >> 431 objectid 923767 offset 175349760 count 1
> > No other ATA errors nearby, there wasn't much activity going on around
> > there either.
> >
> > I tried to remount everything using the fstab, but it wasn't too happy:
> >> ~% sudo mount -a
> >> mount: /mnt/fs: wrong fs type, bad option, bad superblock on /dev/sdb3, 
> >> missing codepage or helper program, or other error.
> > I regret not checking dmesg after that command, that was stupid of me
> > (though I do have dmesg output of this later on).
> >
> > Catting /dev/sda seemed just fine, so at least one could still read
> > from the supposedly bad drive. I also think that the error message
> > just above always lists a random (per boot) drive of the array, not
> > necessarily the one that causes problems, which scares me for a second
> > there.
> >
> > The next "bright" idea I had was maybe this was a small bad block on
> > /dev/sda and what are the chances that the array will try to write
> > again to that spot. Maybe the next reboot will be fine. So I just
> > rebooted.
> >
> > The system didn't come back up anymore (and so did my 3000 mile ssh
> > access that was dear to me). SInce my rootfs was on that array I was
> > dumped to an initrd shell.
> > Any attempts to mount were met with more scary superblock errors (even
> > if i tried /dev/sdb)
> >
>
>
>
> > This time I checked dmesg:
> >> BTRFS info (device sda3): disk space caching is enabled
> >> BTRFS info (device sda3): has skinny extents
> >> BTRFS info (device sda3): start tree-log replay
> >> BTRFS error (device sda3): parent transid verify failed on 4664769363968 
> >> wanted 1143228 found 1143173
> >> BTRFS error (device sda3): parent transid verify failed on 4664769363968 
> >> wanted 1143228 found 1143173
> >> BTRFS: error (device sda3) in btrfs_free_extent:3103 errno-5 IO failure
> >> BTRFS: error (device sda3) in btrfs_run_delayed_refs:2171: errno=-5 IO 
> >> failure
> >> BTRFS warning (device sda3): Skipping commit of aborted transaction.
> >> BTRFS: error (device sda3) in cleanup_transaction:1938: errno-5 10 failure
> >> BTRFS: error (device sda3) in btrfs_replay_log:2254: errno-5 I0 failure 
> >> (Failed to recover log tree)
> >> BTRFS error (device sda3): open_ctree failed
> > A fuller log (but not OCRd) can be found at
> > https://lh3.googleusercontent.com/-aV23XURv_f0/YEGLDeEavbI/AAAAAAAALYI/bFuSQsTYbCM7-z9SSNbcZq-7p1I7wGyLQCK8BGAsYHg/s0/2021-03-04.jpg,
> > though please excuse the format, I have to debug/fix this over VC.
> >
> > I managed to successfully mount by doing `mount -o
> > degraded,ro,norecovery,subvol=/root /new_root`. Seems to work fine for
> > RO access.
> >
>
>
>  From the parent transid verify failed it looks like a disk did not
> receive few writes. A complete dmesg log will be better to understand
> the root cause.


My dmesg only prints at most that snippet every time I try to mount
the fs. Is there any other debugging wanted that I should enable for
this?

I doubt there's any way to get the original dmesg (besides that first
"corrupt leaf" snippet I posted) before the reboot, I assume it didn't
have a chance to write those logs to the rootfs since it went RO.

>
> Thanks.
>
> > I can't really boot anything from this though, systemd refuses to go
> > past what the fstab dictates and without either a root password for
> > the emergency shell (which i don't evne have) or being able to change
> > the fstab (which I don't think I am capable of getting right in that
> > one RW attempt).
> >
> > I used a chroot in that RO mount to start a long smart scan of both
> > drives. I guess I'll find results in a couple of hours.

Long smart scan completed with no errors on both drives.

> >
> > In the meantime I ordered another ST4000VN008 drive for more room for
> > activities, maybe I can do a `btrfs replace` if needed.
> >
> > I was earlier on irc/#btrfs, Zygo mentioned that these (at least the
> > later transid verify errors) are very strange and are either drive
> > firmware, ram or kernel bugs. Hoping this brings a fuller picture.
> > Ram might be a little suspect, it's a newish machine I built, but I
> > have run memtest86 on it for 12 hours with no problems. No ECC though.
> >
> > My questions:
> > * If both my drives' smart run report no errors, how do I recover my
> > array? Ideally I would do this inplace.
> >      * Any suggestions how to use my new third drive to make things safer?
> > * I would be ok with doing a 3 device raid1 in the future, would that
> > protect me from something similar while not degrading to RO?
> >
> > When this is all over I'm setting up my daily btrbk remote snapshot
> > that I've been putting off for an extra piece of mind (then I'll have
> > my data copied on 5 drives in total).
> >
> > Thanks,
> > Alexandru Stan
> >
>


Alexandru Stan

Re: Scared with degraded raid1 fs

Reply via email to