On Tue, Nov 29, 2016 at 3:34 PM, Wilson Meier <wilson.me...@gmail.com> wrote: > On 29.11.2016 18:54, Austin S. Hemmelgarn wrote: >> On 2016-11-29 12:20, Florian Lindner wrote: >>> Hello, >>> >>> I have 4 harddisks with 3TB capacity each. They are all used in a >>> btrfs RAID 5. It has come to my attention, that there >>> seem to be major flaws in btrfs' raid 5 implementation. Because of >>> that, I want to convert the the raid 5 to a raid 10 >>> and I have several questions. >>> >>> * Is that possible as an online conversion? >> Yes, as long as you have a complete array to begin with (converting from >> a degraded raid5/6 array has the same issues as rebuilding a degraded >> raid5/6 array). >>> >>> * Since my effective capacity will shrink during conversions, does >>> btrfs check if there is enough free capacity to >>> convert? As you see below, right now it's probably too full, but I'm >>> going to delete some stuff. >> No, you'll have to do the math yourself. This would be a great project >> idea to place on the wiki though. >>> >>> * I understand the command to convert is >>> >>> btrfs balance start -dconvert=raid10 -mconvert=raid10 /mnt >>> >>> Correct? >> Yes, but I would personally convert first metadata then data. The >> raid10 profile gets better performance than raid5, so converting the >> metadata first (by issuing a balance just covering the metadata) should >> speed up the data conversion a bit). >>> >>> * What disks are allowed to fail? My understanding of a raid 10 is >>> like that >>> >>> disks = {a, b, c, d} >>> >>> raid0( raid1(a, b), raid1(c, d) ) >>> >>> This way (a XOR b) AND (c XOR d) are allowed to fail without the raid >>> to fail (either a or b and c or d are allowed to fail) >>> >>> How is that with a btrfs raid 10? >> A BTRFS raid10 can only sustain one disk failure. Ideally, it would >> work like you show, but in practice it doesn't. > I'm a little bit concerned right now. I migrated my 4 disk raid6 to > raid10 because of the known raid5/6 problems. I assumed that btrfs > raid10 can handle 2 disk failures as longs as they occur in different > stripes. > Could you please point out why it cannot sustain 2 disk failures?
Conventional raid10 has a fixed assignment of which drives are mirrored pairs, and this doesn't happen with Btrfs at the device level but rather the chunk level. And a chunk stripe number is not fixed to a particular device, therefore it's possible a device will have more than one chunk stripe number. So what that means is the loss of two devices has a pretty decent chance of resulting in the loss of both copies of a chunk, whereas conventional RAID 10 must lose both mirrored pairs for data loss to happen. With very cursory testing what I've found is btrfs-progs establishes an initial stripe number to device mapping that's different than the kernel code. The kernel code appears to be pretty consistent so long as the member devices are identically sized. So it's probably not an unfixable problem, but the effect is that right now Btrfs raid10 profile is more like raid0+1. You can use $ sudo btrfs insp dump-tr -t 3 /dev/ That will dump the chunk tree, and you can see if any device has more than one chunk stripe number associated with it. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html