On Thu, Feb 7, 2019 at 10:37 AM Martin Steigerwald <mar...@lichtvoll.de> wrote: > > Chris Murphy - 07.02.19, 18:15: > > > So please change the normal behavior > > > > In the case of no device loss, but device delay, with 'degraded' set > > in fstab you risk a non-deterministic degraded mount. And there is no > > automatic balance (sync) after recovering from a degraded mount. And > > as far as I know there's no automatic transition from degraded to > > normal operation upon later discovery of a previously missing device. > > It's just begging for data loss. That's why it's not the default. > > That's why it's not recommended. > > Still the current behavior is not really user-friendly. And does not > meet expectations that users usually have about how RAID 1 works. I know > BTRFS RAID 1 is no RAID 1, although it is called like this.
I mentioned the user experience is not good, in both my Feb 2 and Feb 5 responses, compared to mdadm and lvm raid1 in the same situation. However the raid1 term only describes replication. It doesn't describe any policy. And whether to fail to mount or mount degraded by default, is a policy. Whether and how to transition from degraded to normal operation when a formerly missing device reappears, is a policy. And whether, and how, and when to rebuild data after resuming normal operation is a policy. A big part of why these policies are MIA is because they require features that just don't exist yet. And perhaps don't even belong in btrfs kernel code or user space tools; but rather a system service or daemon that manages such policies. However, none of that means Btrfs raid1 is not raid1. There's a wrong assumption being made about policies and features in mdadm and LVM, that they are somehow attached to the definition of raid1, but they aren't. > I also somewhat get that with the current state of BTRFS the current > behavior of not allowing a degraded mount may be better… however… I see > clearly room for improvement here. And there very likely will be > discussions like this on this list… until BTRFS acts in a more user > friendly way here. And it's completely appropriate if someone wants to update the Btrfs status page to make more clear what features/behaviors/policies apply to Btrfs raid of all types, or to have a page that summarizes their differences among mdadm and/or LVM raid levels, so users can better assess their risk taking, and choose the best Linux storage technology for their use case. But at least developers know this is the case. And actually, you could mitigate some decent amount of Btrfs missing features with server monitoring tools; including parsing kernel messages. Because right now you aren't even informed of read or write errors, device or csums mismatches or fixups, unless you're checking kernel messages. Where mdadm has the option for emailing notifications to an admin for such things, and lvm has a monitor that I guess does something I haven't used it. Literally Btrfs will only complain about failed writes that would cause immediate ejection of the device by md. -- Chris Murphy