On 1/10/24 09:07, Curt wrote:
On 2024-01-10, David Christensen <dpchr...@holgerdanske.com> wrote:
Given the OP's situation -- 8 consumer SSD's, same make and model,
possibly from a defective manufacturing batch, all purchased at the same
time, all deployed in the same RAID-6, all run 2.5 years 24x7, and all
suddenly showing lots of SMART warnings -- I would not have confidence
in that RAID.
It's curious, but I just heard something on French TV from a journalist
that's relevant to this. She said she'd covered the aeronautics field in
the past and mentioned the *principe de dissemblance* (dissimilarity
principle). Critical redundant parts on aircraft, she claimed, would be
sourced from different manufacturers in order to obviate the possibility
of redundant failures you've raised here.
https://en.wikipedia.org/wiki/Failure_analysis
Using components from different vendors is a known mitigation technique
and can help in the right situations.
Relevant to this this thread, some people use disk drives from different
manufacturers in their RAID's. Doing so with RAID-10 (stripe of
mirrors) is straight forward -- within each mirror, use brand A for the
first disk, brand B for the second, brand C for the third (or hot
spares), etc.. It then makes sense to do the same with HBA's -- use HBA
brand X for the first disks in each mirror, HBA brand Y for the second
disks, HBA brand Z for the third/ spare disks, etc.. For x86
workstations and servers, ECC memory, dual network interfaces, and dual
power supplies come to mind. I am unclear about dual processors and/or
dual memory banks. Moving beyond one computer, the process continues
with KVM/ serial console fabric, networks, electric power, cooling,
etc.. It's just a question of what failure modes you want to protect
against and how much time and money you want to spend.
David