On Tue, Nov 29, 2016 at 3:34 PM, Wilson Meier <wilson.me...@gmail.com> wrote:
> On 29.11.2016 18:54, Austin S. Hemmelgarn wrote:
>> On 2016-11-29 12:20, Florian Lindner wrote:
>>> Hello,
>>>
>>> I have 4 harddisks with 3TB capacity each. They are all used in a
>>> btrfs RAID 5. It has come to my attention, that there
>>> seem to be major flaws in btrfs' raid 5 implementation. Because of
>>> that, I want to convert the the raid 5 to a raid 10
>>> and I have several questions.
>>>
>>> * Is that possible as an online conversion?
>> Yes, as long as you have a complete array to begin with (converting from
>> a degraded raid5/6 array has the same issues as rebuilding a degraded
>> raid5/6 array).
>>>
>>> * Since my effective capacity will shrink during conversions, does
>>> btrfs check if there is enough free capacity to
>>> convert? As you see below, right now it's probably too full, but I'm
>>> going to delete some stuff.
>> No, you'll have to do the math yourself.  This would be a great project
>> idea to place on the wiki though.
>>>
>>> * I understand the command to convert is
>>>
>>> btrfs balance start -dconvert=raid10 -mconvert=raid10 /mnt
>>>
>>> Correct?
>> Yes, but I would personally convert first metadata then data.  The
>> raid10 profile gets better performance than raid5, so converting the
>> metadata first (by issuing a balance just covering the metadata) should
>> speed up the data conversion a bit).
>>>
>>> * What disks are allowed to fail? My understanding of a raid 10 is
>>> like that
>>>
>>> disks = {a, b, c, d}
>>>
>>> raid0( raid1(a, b), raid1(c, d) )
>>>
>>> This way (a XOR b) AND (c XOR d) are allowed to fail without the raid
>>> to fail (either a or b and c or d are allowed to fail)
>>>
>>> How is that with a btrfs raid 10?
>> A BTRFS raid10 can only sustain one disk failure.  Ideally, it would
>> work like you show, but in practice it doesn't.
> I'm a little bit concerned right now. I migrated my 4 disk raid6 to
> raid10 because of the known raid5/6 problems. I assumed that btrfs
> raid10 can handle 2 disk failures as longs as they occur in different
> stripes.
> Could you please point out why it cannot sustain 2 disk failures?

Conventional raid10 has a fixed assignment of which drives are
mirrored pairs, and this doesn't happen with Btrfs at the device level
but rather the chunk level. And a chunk stripe number is not fixed to
a particular device, therefore it's possible a device will have more
than one chunk stripe number. So what that means is the loss of two
devices has a pretty decent chance of resulting in the loss of both
copies of a chunk, whereas conventional RAID 10 must lose both
mirrored pairs for data loss to happen.

With very cursory testing what I've found is btrfs-progs establishes
an initial stripe number to device mapping that's different than the
kernel code. The kernel code appears to be pretty consistent so long
as the member devices are identically sized. So it's probably not an
unfixable problem, but the effect is that right now Btrfs raid10
profile is more like raid0+1.

You can use
$ sudo btrfs insp dump-tr -t 3 /dev/

That will dump the chunk tree, and you can see if any device has more
than one chunk stripe number associated with it.


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to