Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

Josef Bacik Fri, 26 Jun 2020 09:57:36 -0700

On 6/26/20 12:43 PM, Matthew Miller wrote:

On Fri, Jun 26, 2020 at 12:30:35PM -0400, Josef Bacik wrote:

Obviously the Facebook scale, recoverability, and workload is going
to be drastically different from a random Fedora user.  But hardware
wise we are pretty close, at least on the disk side.  Thanks,


Thanks. I guess it's really recoverability I'm most concerned with. I expect
that if one of these nodes has a metadata corruption that results in an
unbootable system, that's really no big deal in the big scheme of things.
It's a bigger deal to home users. :)

Sure, I've answered this a few different times with various members of theworking group committee (or whatever they're called nowadays). I'll copy andpaste what I said to them. The context is "what do we do with bad drives thatblow up at the wrong time".

Now as for what does the average Fedora user do? I've also addressed that abunch over the last few weeks, but instead of pasting like 9 emails I'll justsummarize.

The UX of a completely fucked fs sucks, irregardless of the file system.Systemd currently (but will soon apparently) does not handle booting with a readonly file system, which is essentially what you get when you have criticalmetadata corrupted. You are dumped to a emergency shell, and then you have toknow what to do from there.

With ext4/xfs, you mount read only or you run fsck. With Btrfs you can do thattoo, but then there's like a whole level of other options depending on how badthe disk is. I've written a lot of tools over the years (which are inbtrfs-progs) to recover various levels of broken file systems. To the pointthat you can pretty drastically mess up a FS and I'll still be able to pull datafrom the disk.

But, again, the UX for this _sucks_. You have to know first of all that youshould try mounting read only, and then you have to get something plugged intothe box and copy it over. And then assume the worst, you can't mount read only.Now with ext4/xfs that's it, you are done. With btrfs you are just gettingstarted. You have several built in mount options for recovering differentfailures, all read only. But you have to know that they are there and how touse them.

These things are easily addressed with documentation, but that's only so good.This sort of scenario needs to be baked into Fedora itself, because it's thesame problem no matter which file system you use. Thanks,


Josef

Email elaborating my comments about btrfs's sensitivity to bad hardware and howwe test.


---------------

The fact is I can make any file system unmountable with the right corruption.
The only difference with btrfs is that our metadata is completely dynamic, while
xfs and ext4 are less so.  So they're overwriting the same blocks over and over
again, and there is simply less of "important" metadata for the file system to
function.

The "problem" that btrfs has is it's main strength, it does COW.  That means our
important metadata is constantly being re-written to different segments of the
disk.  So if you have a bad disk, you are much more likely to get unlucky and
end up with some core piece of metadata getting corrupted, and thus resulting in
a file system that cannot be mounted read/write.

Now you are much more likely to hit this in a data segment, because generally
speaking there's more data writes than metadata writes.  The thing I brought up
in the meeting last week was a potential downside for sure, but not something
that will be a common occurrence.  I just checked the fleet for this week and
we've had to reprovision 20 machines out of 138 machines that threw crc errors,
out of N total machines with btrfs fs'es, which is in the millions.  In the same
time period I have 15 xfs boxes that needed to be reprovisioned because of
metadata corruption, out of <100k machines that have xfs.  I don't have data on
ext4 because it doesn't exist in our fleet anymore.

As for testing, there are 8 tests in xfstests that utilize my dm-log-writes
target.  These tests mount the file system, do a random workload, and then
replay the workload one write at a time to validate the file system isn't left
in some intermediate broken state.  This simulates the case of weird things
happening but in a much more concrete and repeatable manner.

There's 65 tests that utilize dm-flakey, which randomly corrupts or drops
writes, and again these are to test different scenarios that have given us
issues in the past.  There's more of these because up until a few years ago this
was our only mechanism for testing this class of failures.  I wrote
dm-log-writes to bring some determinism to our testing.

All of our file systems in linux are extremely thoroughly tested for a variety
of power fail cases.  The only area that btrfs is more likely to screw up is in
the case of bad hardware, and again we're not talking like huge percentage
points difference.  It's a trade off.  You are trading a slight increased
percentage that bad hardware will result in a file system that cannot be mounted
read/write for the ability to detect silent corruption from your memory, cpu, or
storage device.  Thanks,

Josef


_______________________________________________
devel mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/[email protected]

Re: Fedora 33 System-Wide Change proposal: Make btrfs the default file system for desktop variants

Reply via email to