Hi Crystal, On Mon Sep 15, 2025 at 5:37 PM UTC, Crystal Kolipe wrote: > On Mon, Sep 15, 2025 at 03:41:14PM +0000, H. Hartzer wrote: >> This is a little bit frustrating as one usually uses RAID 1 to >> improve reliability, not decrease it! > > Define "reliability".
A conceptual idea of how trustworthy something is. > In simple terms, RAID 1 offers you _some_ protection against a complete disk > failure, (traditionally the risk most envisaged was a head crash). > > If and only if the failure of one drive doesn't take down the whole machine, > (or at least other drives in the mirror, as was common on multi-drop parallel > SCSI), then you hopefully gain uptime. The example I usually give for this > use-case would be media playout at a radio or tv station - you don't want to > wait for 3 hours while you restore from a backup, you need that output to > continue immediately. This is true. It would be ideal in my case, if AHCI hotplug was supported, to be able to replace a drive on the fly. But replacing from a hotswap tray and requiring one reboot isn't too bad, either. I've seen a lot of drives that starting getting bad blocks before catastrophic failure. SMART tends to show these errors. > However, (most implementations of), RAID 1 _increase_ your risk of silent data > corruption, because they read round-robin from all of the disks in the array. That is true, though most disks *should* know if the block's checksum is invalid. Now I have seen it happen before that RAID 1 can have mismatching blocks from one drive to another. I'm not 100% sure what this is from. I think an ideal case, for RAID 1, might be 3+ drives where a correct block would be solved in a "majority wins" situation. Now this would lose the read throughput benefits of RAID 1. > If _any one_ of those disks returns bad data as good, then you read bad data > in to the OS level. Yes, it's true. > Why are you using RAID in the first place? What are you trying to achieve? > Do you actually have a use-case that would benefit from improved anything > over what you get with a simple one SSD, using a normal FFS filesystem, and > normal backups? In my case, I was testing with SSDs on a baremetal provider just to quickly simulate things without going out to the office. I've been using more spinning platter harddrives as of late which certainly do get the occasional bad block. Harddrive failures are pretty much guaranteed on a long enough timeline. On most 24/7 systems, it's more convenient to swap a drive than do a full reinstall. That's not to say that backups aren't more important than RAID, but it can be a faster way to get back online (or never go offline, in the first place.) I've been dabbling with 2.5" SATA harddrives which use very little power. Though they do seem much less reliable than 3.5" drives, I could have 3 or 4 of them using the same amount of power if I was paranoid about drive reliability. And two for most cases is sufficient. > I fail to understand why there is such a desire by so many people to > over-complicate things such as filesystems that are already, (in the case of > FFS), complicated enough. > > In general, unless one has a specific use-case for RAID, or they are actually > testing and developing the RAID code, then leave it out, (and that applies to > any OS, not just OpenBSD). One concern of mine, that I have not tested, is if this easier corruptibility also applies to the crypto discipline. > Furthermore, there seem to have been a lot of scare stories recently about > data loss on FFS, in various scenarios, but hard facts and reproducible steps > are much more thin on the ground. I was readily reproducing corruption to the point of panics, or at least requiring manual fsck. Syncing the Monero blockchain and having the power cut seems quite reliable. I just thought it was very interesting and unusual that I had (as so far tested) 100% reliability with sync and no RAID 1, but with RAID 1, even a single drive, and mounted sync, I could reproduce issues easily. You may well be right that RAID can overcomplicate things, but I feel like RAID 1 should be possible without decreasing any metric of reliability (other than drives possibly giving bad data -- which they *shouldn't* do due to checksums.) Thank you for your reply! -Henrich