On 7/9/20 1:51 PM, Eric Sandeen wrote:
On 7/6/20 12:07 AM, Chris Murphy wrote:
On Fri, Jul 3, 2020 at 8:40 PM Eric Sandeen <sand...@redhat.com>
wrote:

On 7/3/20 1:41 PM, Chris Murphy wrote:
SSDs can fail in weird ways. Some spew garbage as they're
failing, some go read-only. I've seen both. I don't have stats on
how common it is for an SSD to go read-only as it fails, but once
it happens you cannot fsck it. It won't accept writes. If it
won't mount, your only chance to recover data is some kind of
offline scrape tool. And Btrfs does have a very very good scrape
tool, in terms of its success rate - UX is scary. But that can
and will improve.

Ok, you and Josef have both recommended the btrfs restore
("scrape") tool as a next recovery step after fsck fails, and I
figured we should check that out, to see if that alleviates the
concerns about recoverability of user data in the face of
corruption.

I also realized that mkfs of an image isn't representative of an
SSD system typical of Fedora laptops, so I added "-m single" to
mkfs, because this will be the mkfs.btrfs default on SSDs (right?).
Based on Josef's description of fsck's algorithm of throwing away
any block with a bad CRC this seemed worth testing.

I also turned fuzzing /down/ to hitting 2048 bytes out of the 1G
image, or a bit less than 1% of the filesystem blocks, at random.
This is 1/4 the fuzzing rate from the original test.

So: -m single, fuzz 2048 bytes of 1G image, run btrfsck --repair,
mount, mount w/ recovery, and then restore ("scrape") if all that
fails, see what we get.

What's the probability of this kind of corruption occurring in the
real world? If the probability is so low it can't practically be
computed, how do we assess the risk? And if we can't assess risk,
what's the basis of concern?

 From 20 years of filesystem development experience, I know that people
run filesystem repair tools.  It's just a fact.  For a wide variety of
reasons - from bugs, to hardware errors, to admin errors, you name it,
filesystems experience corruption and inconsistencies.  At that point
the administrator needs a path forward.

"people won't need to repair btrfs" is, IMHO, the position that needs
to be supported, not "filesystem repair tools should be robust."

I ran 50 loops, and got:

46 btrfsck failures 20 mount failures

So it ran btrfs restore 20 times; of those, 11 runs lost all or
substantially all of the files; 17 runs lost at least 1/3 of the
files.

Josef states reliability of ext4, xfs, and Btrfs are in the same
ballpark. He also reports one case in 10 years in which he failed to
recover anything. How do you square that with 11 complete failures,
trivially produced? Is there even a reason to suspect there's
residual risk?

Extrapolating from Facebook's usecases to the fedora desktop should be
approached with caution, IMHO.

I've provided evidence that if/when damage happens for whatever reason,
btrfs is unable to recover in place far more often than other filesytems.

When metadata is single profile, Btrfs is basically an early warning
system.> The available research on uncorrectable errors, errors that drive ECC
does not catch, suggests that users are decently likely to experience
at least one block of corruption in the life of the drive. And that
it tends to get worse up until drive failure. But there is much less
chance to detect this, if the file system isn't also checksumming the
vastly larger payload on a drive: the data.

One of the problems in this whole discussion is the assumption that filesystem
inconsistencies only arise from disk bitflips etc; that's just not the case.

Look, I'm just providing evidence of what I've found when re-evaluating the
btrfs administration/repair tools.  I've found them to be quite weak.

 From what I've gathered from these responses, btrfs is unique in that it is
/expected/ that if anything goes wrong, the administrator should be prepared
to scrape out remaining data, re-mkfs, and start over.  If that's acceptable
for the Fedora desktop, that's fine, but I consider it a risk that should not
be ignored when evaluating this proposal.


Agreed, it's the very first thing I said when I was asked what are the downsides. There's clearly more work to be done in the recovery arena. How often do disks fail for Fedora? Do we have that data? Is this a real risk? Nobody can say because Fedora doesn't have data.

Facebook does however have that data, and it's a microscopically small percentage. I agree that Facebook is vastly different from Fedora from a recovery standpoint, but our workloads and hardware I think extrapolate to the normal Fedora user quite well. We drive the disks harder than the normal Fedora user does of course, but in the end we're updating packages, taking snapshots, and building code. We're just doing it at 1000x what a normal Fedora user does. Thanks,

Josef
_______________________________________________
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Reply via email to