On Wed, 11 Aug 2021 02:53:13 -0700 David Christensen <dpchr...@holgerdanske.com> wrote:
> On 8/10/21 7:51 PM, Celejar wrote: > > On Tue, 10 Aug 2021 17:35:32 -0700 > > David Christensen <dpchr...@holgerdanske.com> wrote: > > > >> On 8/10/21 12:56 PM, Dan Ritter wrote: > >>> David Christensen wrote: > >>>> On 8/10/21 8:04 AM, Leandro Noferini wrote: > >>>> > >>>> https://wiki.debian.org/ZFS > > > > ... > > > >>>> - ECC memory is safer than non-ECC memory. > >>> > >>> This is true, but there is nothing that makes ZFS more dangerous > >>> than another filesystem using non-ECC memory. > >> > >> > >> I think the amount of danger depends upon how you do your risk > >> assessment math. I find used entry-level server hardware with ECC > >> memory to be desirable for additional reasons. > > > > Dan's point is that while ECC memory is indeed safer than non-ECC > > memory, this is true whether one is using ZFS or some other filesystem; > > furthermore, with or without ECC memory, there's no reason to believe > > that ZFS is less safe than the alternative. > > > > See: > > > > https://arstechnica.com/information-technology/2020/05/zfs-101-understanding-zfs-storage-and-performance/?comments=1&post=38877683 > > https://jrs-s.net/2015/02/03/will-zfs-and-non-ecc-ram-kill-your-data/ > > > > So while ECC memory is always good, it's not a consideration when > > trying to choose between ZFS and other filesystems. > > > I see two sets of choices: > > 1. Memory integrity: > > a. No error checking or correcting -- non-ECC. > > b. Error checking and correcting -- ECC. > > 2. Operating system storage stack data integrity: > > a. No data integrity -- md, LVM, ext*, FAT, NTFS. > > b. Data integrity -- dm-integrity, btrfs, ZFS. > > > There are four combinations of the above. I order them from highest > risk (A) to lowest risk (D) as follows: > > A. Non-ECC memory (1a) and data integrity (2b) > > B. Non-ECC memory (1a) and no data integrity (2a) > > C. ECC memory (1b) and no data integrity (2a) > > D. ECC memory (1b) and data integrity (2b) > > > I have seen a few computers with failing non-ECC memory and no OS > storage stack data integrity (case B). It might take weeks or months to > identify the problem. If those computers had had OS storage stack data > integrity with automatic correction (case A), the "scrub of death" is > the logical outcome (failure modes and effects analysis); it's just a > question of time. Given the eventual catastrophic outcome (fault hazard > analysis), I see a significant difference in risk between A and B. I myself have no personal experience or deep understanding of the issues, but the experts do not accept your position that A is higher risk than B due to the possibility of the "scrub of death." Here's Jim Salter (from the second link I gave above): > Is ZFS and non-ECC worse than not-ZFS and non-ECC? What about the Scrub > of Death? > > OK, it’s pretty easy to demonstrate that a flipped bit in RAM means > data corruption: if you write that flipped bit back out to disk, > congrats, you just wrote bad data. There’s no arguing that. The real > issue here isn’t whether ECC is good to have, it’s whether non-ECC is > particularly problematic with ZFS. The scenario usually thrown out is > the the much-dreaded Scrub Of Death. > > TL;DR version of the scenario: ZFS is on a system with non-ECC RAM that > has a stuck bit, its user initiates a scrub, and as a result of > in-memory corruption good blocks fail checksum tests and are > overwritten with corrupt data, thus instantly murdering an entire pool. > As far as I can tell, this idea originates with a very prolific user on > the FreeNAS forums named Cyberjock, and he lays it out in this thread > here. It’s a scary idea – what if the very thing that’s supposed to > keep your system safe kills it? A scrub gone mad! Nooooooo! > > The problem is, the scenario as written doesn’t actually make sense. > For one thing, even if you have a particular address in RAM with a > stuck bit, you aren’t going to have your entire filesystem run through > that address. That’s not how memory management works, and if it were > how memory management works, you wouldn’t even have managed to boot the > system: it would have crashed and burned horribly when it failed to > load the operating system in the first place. So no, you might corrupt > a block here and there, but you’re not going to wring the entire > filesystem through a shredder block by precious block. > > But we’re being cheap here. Say you only corrupt one block in 5,000 > this way. That would still be hellacious. So let’s examine the more > reasonable idea of corrupting some data due to bad RAM during a scrub. > And let’s assume that we have RAM that not only isn’t working 100% > properly, but is actively goddamn evil and trying its naive but > enthusiastic best to specifically kill your data during a scrub: > > First, you read a block. This block is good. It is perfectly good data > written to a perfectly good disk with a perfectly matching checksum. > But that block is read into evil RAM, and the evil RAM flips some bits. > Perhaps those bits are in the data itself, or perhaps those bits are in > the checksum. Either way, your perfectly good block now does not appear > to match its checksum, and since we’re scrubbing, ZFS will attempt to > actually repair the “bad” block on disk. Uh-oh! What now? > > Next, you read a copy of the same block – this copy might be a > redundant copy, or it might be reconstructed from parity, depending on > your topology. The redundant copy is easy to visualize – you literally > stored another copy of the block on another disk. Now, if your evil RAM > leaves this block alone, ZFS will see that the second copy matches its > checksum, and so it will overwrite the first block with the same data > it had originally – no data was lost here, just a few wasted disk > cycles. OK. But what if your evil RAM flips a bit in the second copy? > Since it doesn’t match the checksum either, ZFS doesn’t overwrite > anything. It logs an unrecoverable data error for that block, and > leaves both copies untouched on disk. No data has been corrupted. A > later scrub will attempt to read all copies of that block and validate > them just as though the error had never happened, and if this time > either copy passes, the error will be cleared and the block will be > marked valid again (with any copies that don’t pass validation being > overwritten from the one that did). > > Well, huh. That doesn’t sound so bad. So what does your evil RAM need > to do in order to actually overwrite your good data with corrupt data > during a scrub? Well, first it needs to flip some bits during the > initial read of every block that it wants to corrupt. Then, on the > second read of a copy of the block from parity or redundancy, it needs > to not only flip bits, it needs to flip them in such a way that you get > a hash collision. In other words, random bit-flipping won’t do – you > need some bit flipping in the data (with or without some more > bit-flipping in the checksum) that adds up to the corrupt data > correctly hashing to the value in the checksum. By default, ZFS uses > 256-bit SHA validation hashes, which means that a single bit-flip has a > 1 in 2^256 chance of giving you a corrupt block which now matches its > checksum. To be fair, we’re using evil RAM here, so it’s probably going > to do lots of experimenting, and it will try flipping bits in both the > data and the checksum itself, and it will do so multiple times for any > single block. However, that’s multiple 1 in 2^256 (aka roughly 1 in > 10^77) chances, which still makes it vanishingly unlikely to actually > happen… and if your RAM is that damn evil, it’s going to kill your data > whether you’re using ZFS or not. ... [snipped the rest of Jim's analysis] > I don’t care about your logic! I wish to appeal to authority! > > OK. “Authority” in this case doesn’t get much better than Matthew > Ahrens, one of the cofounders of ZFS at Sun Microsystems and current > ZFS developer at Delphix. In the comments to one of my filesystem > articles on Ars Technica, Matthew said “There’s nothing special about > ZFS that requires/encourages the use of ECC RAM more so than any other > filesystem.” > > Hope that helps. =) Celejar