On 1/10/24 09:07, Curt wrote:
On 2024-01-10, David Christensen <dpchr...@holgerdanske.com> wrote:
Given the OP's situation -- 8 consumer SSD's, same make and model,
possibly from a defective manufacturing batch, all purchased at the same
time, all deployed in the same RAID-6, all run 2.5 years 24x7, and all
suddenly showing lots of SMART warnings -- I would not have confidence
in that RAID.

It's curious, but I just heard something on French TV from a journalist
that's relevant to this. She said she'd covered the aeronautics field in
the past and mentioned the *principe de dissemblance* (dissimilarity
principle). Critical redundant parts on aircraft, she claimed, would be
sourced from different manufacturers in order to obviate the possibility
of redundant failures you've raised here.


https://en.wikipedia.org/wiki/Failure_analysis


Using components from different vendors is a known mitigation technique and can help in the right situations.


Relevant to this this thread, some people use disk drives from different manufacturers in their RAID's. Doing so with RAID-10 (stripe of mirrors) is straight forward -- within each mirror, use brand A for the first disk, brand B for the second, brand C for the third (or hot spares), etc.. It then makes sense to do the same with HBA's -- use HBA brand X for the first disks in each mirror, HBA brand Y for the second disks, HBA brand Z for the third/ spare disks, etc.. For x86 workstations and servers, ECC memory, dual network interfaces, and dual power supplies come to mind. I am unclear about dual processors and/or dual memory banks. Moving beyond one computer, the process continues with KVM/ serial console fabric, networks, electric power, cooling, etc.. It's just a question of what failure modes you want to protect against and how much time and money you want to spend.


David

Reply via email to