Re: Unexpected raid1 behaviour

Austin S. Hemmelgarn Tue, 19 Dec 2017 04:27:03 -0800

On 2017-12-18 14:43, Tomasz Pala wrote:

On Mon, Dec 18, 2017 at 08:06:57 -0500, Austin S. Hemmelgarn wrote:

The fact is, the only cases where this is really an issue is if you've
either got intermittently bad hardware, or are dealing with external


Well, the RAID1+ is all about the failing hardware.

About catastrophically failing hardware, not intermittent failure.

storage devices.  For the majority of people who are using multi-device
setups, the common case is internally connected fixed storage devices
with properly working hardware, and for that use case, it works
perfectly fine.


If you're talking about "RAID"-0 or storage pools (volume management)
that is true.
But if you imply, that RAID1+ "works perfectly fine as long as hardware
works fine" this is fundamentally wrong. If the hardware needs to work
properly for the RAID to work properly, noone would need this RAID in
the first place.

I never said the hardware needed to not fail, just that it needed tofail in a consistent manner. BTRFS handles catastrophic failures ofstorage devices just fine right now. It has issues with intermittentfailures, but so does hardware RAID, and so do MD and LVM to a lesserdegree.

that BTRFS should not care.  At the point at which a device is dropping
off the bus and reappearing with enough regularity for this to be an
issue, you have absolutely no idea how else it's corrupting your data,
and support of such a situation is beyond any filesystem (including ZFS).


Support for such situation is exactly what RAID performs. So don't blame
people for expecting this to be handled as long as you call the
filesystem feature a 'RAID'.

No, classical RAID (other than RAID0) is supposed to handle catastrophicfailure of component devices. That is the entirety of the originaldesign purpose, and that is the entirety of what you should be using itfor in production. The point at which you are getting random corruptionon a disk and you're using anything but BTRFS for replication, you_NEED_ to replace that disk, and if you don't you risk it causingcorruption on the other disk. As of right now, BTRFS is no different inthat respect, but I agree that it _should_ be able to handle such asituation eventually.


If this feature is not going to mitigate hardware hiccups by design (as
opposed to "not implemented yet, needs some time", which is perfectly
understandable), just don't call it 'RAID'.

It shouldn't have been called RAID in the first place, that we can agreeon (even if for different reasons).


All the features currently working, like bit-rot mitigation for
duplicated data (dup/raid*) using checksums, are something different
than RAID itself. RAID means "survive failure of N devices/controllers"
- I got one "RAID1" stuck in r/o after degraded mount, not nice... Not
_expected_ to happen after single disk failure (without any reappearing).

And that's a known bug on older kernels (not to mention that you shouldnot be mounting writable and degraded for any purpose other than fixingthe volume).

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Unexpected raid1 behaviour

Reply via email to