Duncan posted on Mon, 30 Oct 2017 04:09:58 +0000 as excerpted: > Zak Kohler posted on Sun, 29 Oct 2017 21:57:00 -0400 as excerpted: > >> So I ran memtest86+ 5.01 for >4 days: >> Pass: 39 Errors: 0 > > Be aware that memtest86+ will detect some types of errors but not > others. > > In particular, some years ago I had some memory (DDR1/3-digit-Opteron > era), actually registered as required by Opterons and ECC, that passed > that sort of memory test because what the test /tests/ is memory cell > retention (if you put a value in does it verify on read-back?), that was > none-the-less bad memory in that at its rated speed it was unreliable at > memory /transfers/. > > Eventually that mobo got a BIOS update that could adjust memory > clocking, and I downclocked it a notch[.] At > the lower clock it was rock stable, even with reduced wait-states to > make up a bit of the performance I was losing to the lower clock. But I > had to get a BIOS that could do it, first [...]
> That's what really amazed me about reiserfs, that it remained stable > thru not only that hardware problem but various others I've had over the > years that would have killed or made unworkable other filesystems. BTW, one of those other hardware problems I had, the one that ultimately did in my old server-clase mobo, was leaky capacitors on the then 8-ish years old system. It was of the generation that had problem capacitors, and it eventually succumbed... The reason this is relevant is that it was the storage path that had the worst problems. Before I figured out what the problem actually was I did try btrfs, with around kernel 3.6 at the time, and it really /was/ unusable due to checksum errors. But I could still limp along with reiserfs... One thing about the behavior I noticed, however, was that as the problem was developing, the system was more usable if I kept it reasonably cool. By the time I gave up on it, it was early summer here in Phoenix, and temperatures were climbing. But I was sitting at home with the AC on, in a heavy winter jacket, wearing sweats under my pants and trying to type with gloves on my hands to keep warm, in ordered to cool down the computer so it'd work. That's when I decided enough was enough and gave up on it. I only found the burst capacitors, however, once I got the new mobo and was switching out the old one. No WONDER it wasn't working right any more! The rest of the system was actually reasonably stable, however. I guess the worst caps were in the storage path. But as I said, reiserfs was amazing. I bought a SATA addon board with the same chipset as on the old mobo so I could boot the old monolithic kernel with those drivers builtin on the new mobo, and I didn't notice any corruption or anything. But as I said, btrfs was entirely unusable on the old hardware, due to both checksum errors and large transactions (like trying to copy files over from reiserfs) ending up entirely reverted when I'd crash, instead of the partial completion I'd get on reiserfs, so I could at least reboot and start where it has crashed on reiserfs, instead of having to start over entirely, thus making no progress at all, which is what I was seeing on btrfs. That reiserfs continued to work well enough to keep going so long under those conditions, while btrfs was entirely unworkable, and even more that once I was running good hardware again, I didn't see massive corruption on reiserfs as a result of trying to run it on so long on the bad hardware, was really /really/ amazing! But like I said, I don't expect that btrfs, with its checksumming, with /ever/ be really workable on that sort of defective hardware. It was just really amazing to me that reiserfs wasn't screwed up by it as well, as it had every right to be given the screwed up hardware I was trying to run it on. No filesystem can be expected to go thru that and end up still usable, but somehow, reiserfs did. Bottom line, it could be the storage path, not the memory or cpu. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html