On 02/06/2018 07:15 AM, Liu Bo wrote:
Btrfs tries its best to tolerate write errors, but kind of silently
(except some messages in kernel log).
For raid1 and raid10, this is usually not a problem because there is a
copy as backup, while for parity based raid setup, i.e. raid5 and
raid6, the problem is that, if a write error occurs due to some bad
sectors, one horizonal stripe becomes degraded and the number of write
errors it can tolerate gets reduced by one, now if two disk fails,
data may be lost forever.
This is equally true in raid1, raid10, and raid5. Sorry I didn't get
the point why degraded stripe is critical only to the parity based
And does it really need a bad chunk list to fix in case of parity
based stripes or the balance without bad chunks list can fix as well?
One way to mitigate the data loss pain is to expose 'bad chunks',
i.e. degraded chunks, to users, so that they can use 'btrfs balance'
to relocate the whole chunk and get the full raid6 protection again
(if the relocation works).
Depending on the type of disk error its recovery action would vary. For
example, it can be a complete disk fail or interim RW failure due to
environmental/transport factors. The disk auto relocation will do the
job of relocating the real bad blocks in the most of the modern disks.
The challenging task will be to know where to draw the line between
complete disk failure (failed) vs interim disk failure (offline) so I
had plans of making it tunable base on number of disk errors.
If it's confirmed that a disk is failed, the auto-replace with the hot
spare disk will be its recovery action. Balance with a failed disk won't
Patches to these are in the ML.
If the failure is momentary due to environmental factors, including the
transport layer, then as we expect the disk with the data will come back
we shouldn't kick in the hot spare, that is disk state offline, or maybe
its a state where read old data is fine, but cannot write new data.
I think you are addressing this interim state. It's better to define the
disk states first so that its recovery action can be defined. I can
revise the patches on that. So that replace VS re-balance using bad
chunks can be decided.
This introduces 'bad_chunks' in btrfs's per-fs sysfs directory. Once
a chunk of raid5 or raid6 becomes degraded, it will appear in
AFAIK a variable list of output is not allowed on sysfs.
IMHO list of bad chunks won't help the user (it ok if its needed by
kernel). It will help if you provide the list of affected-files
so that the user can use it script to make additional interim external
copy until the disk recovers from the interim error.
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html