On 2019-01-16 13:15, Chris Murphy wrote:
On Wed, Jan 16, 2019 at 7:58 AM Stefan K <shado...@gmx.net> wrote:

  :(
that means when one jbod fail its there is no guarantee that it works fine? 
like in zfs? well that sucks
Didn't anyone think to program it that way?

The mirroring is a function of the block group, not the block device.
And yes that's part of the intentional design and why it's so
flexible. A real raid10 isn't as flexible, so to enforce the
allocation of specific block group stripes to specific block devices
would add complexity to the allocator while reducing flexibility. It's
not impossible, it'd just come with caveats like no three device
raid10 like now; and you'd have to figure out what to do if the user
adds one new device instead of two at a time, and what if any new
device isn't the same size as existing devices or if you add two
devices that aren't the same size. Do you refuse to add such devices?
What limitations do we run into when rebalancing? It's way more
complicated.

Btrfs raid10 really should not be called raid10. It sets up the wrong
user expectation entirely. It's more like raid0+1, except even that is
deceptive because in theory a legit raid0+1 you can lose multiple
drives on one side of the mirror (but not both); but with Btrfs raid10
you really can't lose more than one drive. And therefore it does not
scale. The probability of downtime increases as drives are added;
whereas with a real raid10 downtime doesn't change.

In your case you're better off with raid0'ing the two drives in each
enclosure (whether it's a feature of the enclosure or doing it with
mdadm or LVM). And then using Btrfs raid1 on top of the resulting
virtual block devices. Or do mdadm/LVM raid10, and format it Btrfs. Or
yeah, use ZFS.
I was about to recommend the same BTRFS raid1 on top of MD or LVM RAID0 approach myself. Not only will it get you as close as possible with BTRFS to the ZFS configuration you posted, it will also net you slightly better performance than BTRFS in raid10 mode.

Realistically, it's not perfect (if you lose one of the JBOD arrays, you have to rebuild that array completely and then replace the higher-level device in BTRFS, instead of just replacing the disk at the lower level), but the same approach can be extrapolated to cover a wide variety of configurations in terms of required failure domains, and I can attest to the fact that it works (I've used this configuration a lot myself, but mostly for performance reasons, not reliability).

Reply via email to