Re: BTRFS raid6 unmountable after a couple of days of usage.

Austin S Hemmelgarn Wed, 15 Jul 2015 04:08:51 -0700

On 2015-07-14 19:20, Chris Murphy wrote:

On Tue, Jul 14, 2015 at 7:25 AM, Austin S Hemmelgarn
<ahferro...@gmail.com> wrote:

On 2015-07-14 07:49, Austin S Hemmelgarn wrote:


So, after experiencing this same issue multiple times (on almost a dozen
different kernel versions since 4.0) and ruling out the possibility of it
being caused by my hardware (or at least, the RAM, SATA controller and disk
drives themselves), I've decided to report it here.

The general symptom is that raid6 profile filesystems that I have are
working fine for multiple weeks, until I either reboot or otherwise try to
remount them, at which point the system refuses to mount them.

I'm currently using btrfs-progs v4.1 with kernel 4.1.2, although I've been
seeing this with versions of both since 4.0.

Output of 'btrfs fi show' for the most recent fs that I had this issue
with:
          Label: 'altroot'  uuid: 86eef6b9-febe-4350-a316-4cb00c40bbc5
         Total devices 4 FS bytes used 9.70GiB
         devid    1 size 24.00GiB used 6.03GiB path
/dev/mapper/vg-altroot.0
         devid    2 size 24.00GiB used 6.01GiB path
/dev/mapper/vg-altroot.1
         devid    3 size 24.00GiB used 6.01GiB path
/dev/mapper/vg-altroot.2
         devid    4 size 24.00GiB used 6.01GiB path
/dev/mapper/vg-altroot.3

          btrfs-progs v4.1

Each of the individual LVS that are in the FS is just a flat chunk of
space on a separate disk from the others.

The FS itself passes btrfs check just fine (no reported errors, exit value
of 0), but the kernel refuses to mount it with the message 'open_ctree
failed'.

I've run btrfs chunk recover and attached the output from that.

Here's a link to an image from 'btrfs image -c9 -w':
https://www.dropbox.com/s/pl7gs305ej65u9q/altroot.btrfs.img?dl=0
(That link will expire in 30 days, let me know if you need access to it
beyond that).

The filesystems in question all see relatively light but consistent usage
as targets for receiving daily incremental snapshots for on-system backups
(and because I know someone will mention it, yes, I do have other backups of
the data, these are just my online backups).

Further updates, I just tried mounting the filesystem from the image above
again, this time passing device= options for each device in the FS, and it
seems to be working fine now.  I've tried this with the other filesystems
however, and they still won't mount.


And it's the same message with the usual suspects: recovery,
ro,recovery ? How about degraded even though it's not degraded? And
what about 'btrfs rescue zero-log' ?

Yeah, same result for both, and zero-log didn't help (although that kind of doesn't surprise me, as it was cleanly unmounted).


Of course it's weird that btrfs check doesn't complain, but mount
does. I don't understand that, so it's good you've got an image. If
either recovery or zero-log fix the problem, my understanding is this
suggests hardware did something Btrfs didn't expect.

I've run into cases in the past where this happens, although not recently (last time I remember it happening was back around 3.14 I think); and, interestingly, running check --repair in those cases did fix things, although that didn't complain about any issues either.

I've managed to get the other filesystems I was having issues with mounted again with the device= options and clear_cache after running btrfs dev scan a couple of times. It seems to me (at least from what I'm seeing) that there is some metadata that isn't synchronized properly between the disks. I've heard mention from multiple sources of similar issues happening occasionally with raid1 back around kernel 3.16-3.17, and passing a different device to mount helping with that.

smime.p7s
Description: S/MIME Cryptographic Signature

Re: BTRFS raid6 unmountable after a couple of days of usage.

Reply via email to