On 2017-10-16 21:14, Adam Borowski wrote:
On Mon, Oct 16, 2017 at 01:27:40PM -0400, Austin S. Hemmelgarn wrote:
On 2017-10-16 12:57, Zoltan wrote:
On Mon, Oct 16, 2017 at 1:53 PM, Austin S. Hemmelgarn wrote:
In an ideal situation, scrubbing should not be an 'only if needed' thing,
even for a regular array that isn't dealing with USB issues. From a
practical perspective, there's no way to know for certain if a scrub is
needed short of reading every single file in the filesystem in it's
entirety, at which point, you're just better off running a scrub (because if
you _do_ need to scrub, you'll end up reading everything twice).

[...]  There are three things to deal with here:
1. Latent data corruption caused either by bit rot, or by a half-write (that
is, one copy got written successfully, then the other device disappeared
_before_ the other copy got written).
2. Single chunks generated when the array is degraded.
3. Half-raid1 chunks generated by newer kernels when the array is degraded.

Note that any of the above other than bit rot affect only very recent data.
If we keep record of the last known-good generation, all of that can be
enumerated, allowing us to make a selective scrub that checks only a small
part of the disk.  A linear read a 8TB disk takes 14 hours...

If we ever get auto-recovery, this is a fine candidate.
Indeed, and in fact I think that generational filtering may in fact be one of the easier performance improvements here too.

Scrub will fix problem 1 because that's what it's designed to fix.  it will
also fix problem 3, since that behaves just like problem 1 from a
higher-level perspective.  It won't fix problem 2 though, as it doesn't look
at chunk types (only if the data in the chunk doesn't have the correct
number of valid copies).

Here not even tracking generations is required: a soft convert balance
touches only bad chunks.  Again, would work well for auto-recovery, as it's
a no-op if all is well.
However, it would require some minor differences from the current balance command, as newer kernels (are supposed to) generate half-raid1 chunks instead of single chunks, though that can also be fixed by scrub.

In contrast, the balance command you quoted won't fix issue 1 (because it
doesn't validate checksums or check that data has the right number of
copies), or issue 3 (because it's been told to only operate on non-raid1
chunks), but it will fix issue 2.

In comparison to both of the above, a full balance without filters will fix
all three issues, although it will do so less efficiently (in terms of both
time and disk usage) than running a soft-conversion balance followed by a
scrub.

"less efficiently" is an understatement.  Scrub gets a good part of
theoretical linear speed, while I just had a single metadata block take
14428 seconds to balance.
Yeah, the metadata especially can get pretty bad.

In the case of normal usage, device disconnects are rare, so you should
generally be more worried about latent data corruption.

Yeah, but certain setups (like anything USB) gets disconnect quite often.
It would be nice to get them right.  MD thanks to write-intent bitmap can
recover almost instantly, btrfs could do it better -- the code to do so
isn't written yet.
The write intent bitmap is also exponentially easier to implement than what's be needed for BTRFS.

monitor the kernel log to watch for device disconnects, remount the
filesystem when the device reconnects, and then run the balance command
followed by a scrub.  With most hardware I've seen, USB disconnects tend to
be relatively frequent unless you're using very high quality cabling and
peripheral devices.  If, however, they happen less than once a day most of
the time, just set up the log monitor to remount, and set the balance and
scrub commands on the schedule I suggested above for normal usage.

A day-long recovery for an event that happens daily isn't a particularly
enticing prospect.
I forget sometimes that people insist on storing large volumes of data on unreliable storage...
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to