> -----Original Message-----
> From: [email protected] <linux-btrfs-
> [email protected]> On Behalf Of Chris Murphy
> Sent: Thursday, 17 January 2019 5:15 AM
> To: Stefan K <[email protected]>
> Cc: Linux Btrfs <[email protected]>
> Subject: Re: question about creating a raid10
>
> On Wed, Jan 16, 2019 at 7:58 AM Stefan K <[email protected]> wrote:
> >
> > :(
> > that means when one jbod fail its there is no guarantee that it works
> > fine? like in zfs? well that sucks Didn't anyone think to program it that
> > way?
>
> The mirroring is a function of the block group, not the block device.
> And yes that's part of the intentional design and why it's so flexible. A real
> raid10 isn't as flexible, so to enforce the allocation of specific block group
> stripes to specific block devices would add complexity to the allocator while
> reducing flexibility. It's not impossible, it'd just come with caveats like no
> three device
> raid10 like now; and you'd have to figure out what to do if the user adds one
> new device instead of two at a time, and what if any new device isn't the
> same size as existing devices or if you add two devices that aren't the same
> size. Do you refuse to add such devices?
> What limitations do we run into when rebalancing? It's way more
> complicated.
>
> Btrfs raid10 really should not be called raid10. It sets up the wrong user
> expectation entirely. It's more like raid0+1, except even that is deceptive
> because in theory a legit raid0+1 you can lose multiple drives on one side of
> the mirror (but not both); but with Btrfs raid10 you really can't lose more
> than one drive. And therefore it does not scale. The probability of downtime
> increases as drives are added; whereas with a real raid10 downtime doesn't
> change.
>
> In your case you're better off with raid0'ing the two drives in each enclosure
> (whether it's a feature of the enclosure or doing it with mdadm or LVM). And
> then using Btrfs raid1 on top of the resulting virtual block devices. Or do
> mdadm/LVM raid10, and format it Btrfs. Or yeah, use ZFS.
What I've done is create separate lvm volumes and groups, and assigned the lvm
groups to separate physical storage so I don't accidently get two btrfs mirrors
on the same device.
It's a bit complicated (especially since I'm using caching with lvm) but it
works very well.
vm-server ~ # lvs
LV VG Attr LSize Pool Origin Data%
Meta% Move Log Cpy%Sync Convert
backup-a a Cwi-aoC--- <2.73t [cache-backup-a] [backup-a_corig] 99.99
24.63 0.00
lvol0 a -wi-a----- 44.00m
lvol1 a -wi-a----- 44.00m
storage-a a Cwi-aoC--- <3.64t [cache-storage-a] [storage-a_corig] 99.98
24.76 0.00
backup-b b Cwi-aoC--- <2.73t [cache-backup-b] [backup-b_corig] 99.99
24.75 0.00
storage-b b Cwi-aoC--- <3.64t [cache-storage-b] [storage-b_corig] 99.99
24.66 0.00
storage-c c -wi-a----- <3.64t
vm-server ~ # vgs
VG #PV #LV #SN Attr VSize VFree
a 3 4 0 wz--n- 6.70t 0
b 3 2 0 wz--n- 6.70t 8.00m
c 1 1 0 wz--n- <3.64t 0
vm-server ~ # pvs
PV VG Fmt Attr PSize PFree
/dev/sda4 a lvm2 a-- <342.00g 0
/dev/sdb4 b lvm2 a-- <342.00g 8.00m
/dev/sdc1 a lvm2 a-- <2.73t 0
/dev/sdd1 b lvm2 a-- <2.73t 0
/dev/sdf1 a lvm2 a-- <3.64t 0
/dev/sdh1 c lvm2 a-- <3.64t 0
/dev/sdi1 b lvm2 a-- <3.64t 0
vm-server ~ # btrfs fi sh
Label: 'Root' uuid: 58d27dbd-7c1e-4ef7-8d43-e93df1537b08
Total devices 2 FS bytes used 55.26GiB
devid 13 size 100.00GiB used 65.03GiB path /dev/sdb1
devid 14 size 100.00GiB used 65.03GiB path /dev/sda1
Label: 'Boot' uuid: 8f63cd03-67b2-47cd-85ce-ca355769c123
Total devices 2 FS bytes used 66.11MiB
devid 1 size 1.00GiB used 356.00MiB path /dev/sdb6
devid 2 size 1.00GiB used 0.00B path /dev/sda6
Label: 'Storage' uuid: 1438fdc5-8b2a-47b3-8a5b-eb74cde3df42
Total devices 4 FS bytes used 2.85TiB
devid 1 size 3.61TiB used 3.19TiB path /dev/mapper/b-storage--b
devid 2 size 3.42TiB used 3.02TiB path /dev/mapper/a-storage--a
devid 3 size 279.40GiB used 173.00GiB path /dev/sdg1
devid 4 size 279.40GiB used 172.00GiB path /dev/sde1
Label: 'Backup' uuid: 21e59d66-3e88-4fc9-806f-69bde58be6a3
Total devices 2 FS bytes used 1.31TiB
devid 1 size 2.73TiB used 1.31TiB path /dev/mapper/a-backup--a
devid 2 size 2.73TiB used 1.31TiB path /dev/mapper/b-backup--b