On 2014-10-12 06:14, Martin Steigerwald wrote:
Am Freitag, 10. Oktober 2014, 10:37:44 schrieb Chris Murphy:
On Oct 10, 2014, at 6:53 AM, Bob Marley <bobmar...@shiftmail.org> wrote:
On 10/10/2014 03:58, Chris Murphy wrote:
* mount -o recovery

        "Enable autorecovery attempts if a bad tree root is found at mount
        time."

I'm confused why it's not the default yet. Maybe it's continuing to
evolve at a pace that suggests something could sneak in that makes
things worse? It is almost an oxymoron in that I'm manually enabling an
autorecovery

If true, maybe the closest indication we'd get of btrfs stablity is the
default enabling of autorecovery.>
No way!
I wouldn't want a default like that.

If you think at distributed transactions: suppose a sync was issued on
both sides of a distributed transaction, then power was lost on one side,
than btrfs had corruption. When I remount it, definitely the worst thing
that can happen is that it auto-rolls-back to a previous known-good
state.
For a general purpose file system, losing 30 seconds (or less) of
questionably committed data, likely corrupt, is a file system that won't
mount without user intervention, which requires a secret decoder ring to
get it to mount at all. And may require the use of specialized tools to
retrieve that data in any case.

The fail safe behavior is to treat the known good tree root as the default
tree root, and bypass the bad tree root if it cannot be repaired, so that
the volume can be mounted with default mount options (i.e. the ones in
fstab). Otherwise it's a filesystem that isn't well suited for general
purpose use as rootfs let alone for boot.

To understand this a bit better:

What can be the reasons a recent tree gets corrupted?

Well, so far I have had the following cause corrupted trees:
1. Kernel panic during resume from ACPI S1 (suspend to RAM), which just happened to be in the middle of a tree commit.
2. Generic power loss during a tree commit.
3. A device not properly honoring write-barriers (the operations immediately adjacent to the write barrier weren't being ordered correctly all the time).

Based on what I know about BTRFS, the following could also cause problems:
1. A single-event-upset somewhere in the write path.
2. The kernel issuing a write to the wrong device (I haven't had this happen to me, but know people who have).

In general, any of these will cause problems for pretty much any filesystem, not just BTRFS.
I always thought with a controller and device and driver combination that
honors fsync with BTRFS it would either be the new state of the last known
good state *anyway*. So where does the need to rollback arise from?

I think that in this case the term rollback is a bit ambiguous, here it means from the point of view of userspace, which sees the FS as having 'rolled-back' from the most recent state to the last known good state.
That said all journalling filesystems have some sort of rollback as far as I
understand: If the last journal entry is incomplete they discard it on journal
replay. So even there you use the last seconds of write activity.

But in case fsync() returns the data needs to be safe on disk. I always
thought BTRFS honors this under *any* circumstance. If some proposed
autorollback breaks this guarentee, I think something is broke elsewhere.

And fsync is an fsync is an fsync. Its semantics are clear as crystal. There
is nothing, absolutely nothing to discuss about it.

An fsync completes if the device itself reported "Yeah, I have the data on
disk, all safe and cool to go". Anything else is a bug IMO.

Or a hardware issue, most filesystems need disks to properly honor write barriers to provide guaranteed semantics on an fsync, and many consumer disk drives still don't honor them consistently.

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to