On Wed, Sep 21, 2016 at 8:08 PM, Qu Wenruo <quwen...@cn.fujitsu.com> wrote:
> At 09/21/2016 11:13 PM, Chris Murphy wrote:

>> I understand some things should go in fsck for comparison. But in this
>> case I don't see how it can help. Parity is not checksummed. The only
>> way to know if it's wrong is to read all of the data strips, compute
>> parity, and compare in-memory parity from current read to on-disk
>> parity.
> That's what we plan to do.
> And I don't see the necessary to csum the parity.
> Why csum a csum again?



>> There is already an offline scrub in btrfs
>> check which doesn't repair, but also I don't know if it checks parity.
>>        --check-data-csum
>>            verify checksums of data blocks
> Just as you expected, it doesn't check parity.
> Even for RAID1/DUP, it won't check the backup if it succeeded reading the
> first stripe.

Both copies are not scrubbed? Oh hell...

[chris@f24s ~]$ sudo btrfs scrub status /brick2
scrub status for 7fea4120-9581-43cb-ab07-6631757c0b55
    scrub started at Tue Sep 20 12:16:18 2016 and finished after 01:46:58
    total bytes scrubbed: 955.93GiB with 0 errors

How can this possibly correctly say 956GiB scrubbed if it has not
checked both copies? That message is saying *all* the data, both
copies, were scrubbed. You're saying that message is wrong? It only
scrubbed half that amount?

[chris@f24s ~]$ sudo btrfs fi df /brick2
Data, RAID1: total=478.00GiB, used=477.11GiB
System, RAID1: total=32.00MiB, used=96.00KiB
Metadata, RAID1: total=2.00GiB, used=877.59MiB
GlobalReserve, single: total=304.00MiB, used=0.00B

When that scrub was happening, both drives were being accessed at 100%

> Current implement doesn't really care if it's the data or the copy
> corrupted, any data can be read out, then there is no problem.
> The same thing applies to tree blocks.
> So the ability to check every stripe/copy is still quite needed for that
> option.
> And that's what I'm planning to enhance, make --check-data-csum to kernel
> scrub equivalent.

OK thanks.

>>            This expects that the filesystem is otherwise OK, so this
>> is basically and
>>            offline scrub but does not repair data from spare coipes.
> Repair can be implemented, but maybe just rewrite the same data into the
> same place.
> If that's a bad block, then it can't repair further more unless we can
> relocate extent to other place.

Any device that's out of reserve sectors and can no longer remap LBA's
on its own, is a drive that needs to be decommissioned. It's a new
feature in just the last year or so that mdadm has a badblocks map so
it can do what the drive won't do, but I'm personally not a fan of
keeping malfunctioning drives in RAID.

>> Is it possible to put parities into their own tree? They'd be
>> checksummed there.
> Personally speaking, this is quite a bad idea to me.
> I prefer to separate different logical layers into their own codes.
> Not mixing them together.
> Block level things to block level(RAID/Chunk), logical thing to logical
> level(tree blocks).


> Current btrfs csum design is already much much better than pure RAID.
> Just think of RAID1, while one copy is corrupted, then which copy is correct
> then?


Chris Murphy
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to