Quoting Harald Arnesen ([email protected]): > Den 27.12.2017 19:34, skrev [email protected]: > > > Please remember that all RAID should have ECC RAM and when it comes to > > XFS it is MANDATORY to avoid massive data corruption. > > And a UPS.
To summarise the summary of the summary, concerning the above: I think many folks are not very good at understanding risk models. ECC RAM is not sufficient to catch all bad RAM problems, only some. Back in 2006, I had an interesting case of this: http://linuxmafia.com/pipermail/conspire/2006-December/002662.html http://linuxmafia.com/pipermail/conspire/2006-December/002668.html http://linuxmafia.com/pipermail/conspire/2007-January/002743.html I know most people won't bother to read that, so I'll summarise: My VA Linux Systems 2230 2U that was my prototype next-deployment server showed a perplexing pattern of spontaneous reboots, even though all of the 512MB of RAM was ECC SDRAM sticks on a server-grade ECC-supporting Intel L440GX+ 'Lancewood' motherboard. The RAM had also passed long testing using memtest86. Yet, something about the situation seemed to still suggest one or more bad RAM stick. As related in the mailing list links, I found the bad RAM using only logic and stubborn use of iterative kernel compiles with 'make -j NN' cranked high enough to exercise all the RAM. (And no, it wasn't a bad memory socket. I was able to eliminate that.) There are also far more worrisome causes of filesystem corruption than bad RAM, not even counting software problems. My one-time colleague Ted T'so once wrote an excellent piece, that I can't find at the moment, about how ext2/ext3 code had necessarily been written with a defensive attitude, to compensate to the maximum possible extent for the ways commodity PeeCee hardware tends to misbehave, e.g., the way cheap HBAs often write random garbage inadvertently for a brief while in the process of losing power when the system gets shut off. T'so observed that this risk model from commodity hardware didn't exist on, e.g., SGI hardware built to run IRIX, so the XFS filesystem code on IRIX didn't need to protect against that form of loss, while ext2/ext3 did. (It may be that XFS got improved in exactly that area in the years since the Linux port. I haven't used it since I ran Debian on it during 2001-2.) Further food for thought: https://nctritech.wordpress.com/2017/03/07/zfs-wont-save-you-fancy-filesystem-fanatics-need-to-get-a-clue-about-bit-rot-and-raid-5/ -- Cheers, There are only 10 types of people in this world -- Rick Moen those who understand binary arithmetic and those who don't. [email protected] McQ! (4x80) _______________________________________________ Dng mailing list [email protected] https://mailinglists.dyne.org/cgi-bin/mailman/listinfo/dng
