On Tue, Jan 28, 2020 at 08:02:15PM +1100, russ...@coker.com.au wrote: > Having a storage device fail entirely seems like a rare occurance. The only > time it happened to me in the last 5 years is a SSD that stopped accepting > writes (reads still mostly worked OK).
it's not rare at all, but a drive doesn't have to be completely non-responsive to be considered "dead". It just has to consistently cause enough errors that it results in the pool being degraded. I recently had a seagate ironwolf 4TB drive that would consistently cause problems in my "backup" pool (8TB in two mirrored pairs of 4TB drives, i.e. RAID-10, containing 'zfs send' backups of all my other machines). Whenever it was under moderately heavy load, it would cause enough errors to be kicked, degrading the pool. I didn't have a spare drive to replace it immediately, so just "zpool clear"-ed it several times. Running a scrub on that pool with that drive was guaranteed to degrade the pool within minutes. and, yeah, i moved it around to different SATA & SAS ports just in case it was the port and not the drive. nope. it was the drive. To me, that's a dead drive because it's not safe to use. it can not be trusted to reliably store data. it is junk. the only good use for it is to scrap it for the magnets. (and, btw, that's why I use ZFS and used to use RAID. Without redundancy from RAID-[156Z] or similar, such a drive would result in data loss. Even worse, without the error detection and correction from ZFS, such a drive would result in data corruption). > I've had a couple of SSDs have checksum errors recently and a lot of hard > drives have checksum errors. Checksum errors (where the drive returns what > it considers good data but BTRFS or ZFS regard as bad data) are by far the > most common failures I see of the 40+ storage devices I'm running in recent > times. a drive that consistently returns bad data is not fit for purpose. it is junk. it is a dead drive. > BTRFS "dup" and ZFS "copies=2" would cover almost all storage hardware > issues that I've seen in the last 5+ years. IMO, two copies of data on a drive you can't trust isn't significantly better or more useful than one copy. It's roughly equivalent to making a photocopy of your important documents and then putting both copies in the same soggy cardboard box in a damp cellar. If you want redundancy, use two or more drives. Store your important documents in two or more different locations. and backup regularly. > > If a drive is failing, all the read or write re-tries kill performance on > > a zpool, and that drive will eventually be evicted from the pool. Lose > > enough drives, and your pool goes from "DEGRADED" to "FAILED", and your > > data goes with it. > > So far I haven't seen that happen on my ZFS servers. I have replaced at > least 20 disks in zpools due to excessive checksum errors. I've never had a pool go to FAILED state, either. I've had pools go to DEGRADED *lots* of times. And almost every time it comes after massive performance drops due to retries - which can be seen in the kernel logs. Depending on the brand, you can also clearly hear the head re-seeking as it tries again and again to read from the bad sector. More importantly, it's not difficult or unlikely for a pool go from being merely DEGRADED to FAILED. A drive doesn't have to fail entirely for it be kicked out of the pool, and if you have enough drives kicked out of a vdev or a pool (2 drives for mirror or raidz-1, 3 for raidz-2, 4 for raidz-3), then that entire vdev is FAILED, not just DEGRADED, and the entire pool will likely be FAILED(*) as a result. That's what happens when there are not enough working drives in a vdev to store the data that's supposed to be stored on it. And the longer you wait to replace a dead/faulty drive, the more likely it is that another drive will die while the pool is degraded. Which is why best practise is to replace the drive ASAP...and also why zfs and some other raid/raid-like HW & SW support "spare" devices to automatically replace them. (*) there are some pool layouts that are resistant (but not immune) to failing - e.g. a mirror of any vdev with redundancy, such as a mirrored pair of raidz vdevs. which is why RAID of any kind is not a substitute for backups. craig -- craig sanders <c...@taz.net.au> _______________________________________________ luv-main mailing list luv-main@luv.asn.au https://lists.luv.asn.au/cgi-bin/mailman/listinfo/luv-main