Well it looks like things have stabilized....for the moment at least. $ btrfs scrub start --offline --progress /dev/disk/by-id/XX3 Doing offline scrub [o] [681/681] Scrub result: Tree bytes scrubbed: 5234425856 Tree extents scrubbed: 638968 Data bytes scrubbed: 4353724284928 Data extents scrubbed: 374300 Data bytes without csum: 533200896 Read error: 0 Verify error: 0 Csum error: 0
$ sudo btrfs scrub start --offline --progress /dev/disk/by-id/XX3 Doing offline scrub [o] [681/681] Scrub result: Tree bytes scrubbed: 5234425856 Tree extents scrubbed: 638968 Data bytes scrubbed: 4353724284928 Data extents scrubbed: 374300 Data bytes without csum: 533200896 Read error: 0 Verify error: 0 Csum error: 0 $ sudo btrfs send /mnt/dataroot.2017.10.21 | pv -i2 > /dev/null At subvol /mnt/dataroot.2017.10.21 1.55TiB 1:38:46 [ 283MiB/s] [ <=> ] One interesting note is that when the --offline scrub came back with Csum errors, sometimes the Tree bytes scrubbed were different: Tree bytes scrubbed: 5234491392 #bad vs Tree bytes scrubbed: 5234425856 #good The hardware is a Q6600 (the first Core2 Quad @2.4GHz) and a dell PERC 6/i card flashed with IT mode. *** 2 days have past since I wrote the above I checked my overclock and sure enough I had the FSB boosted, CPU reaching ~2.9 GHz. The PCI were held at a constant freq but I bet there was some bad interaction with the PERC. I don't know why the system chose to be stable 2 days ago before resetting the overclock, but I am very confident it will stay that way now. Takeaways: 1. I came to btrfs because upon manual hash comparison I noticed bit flips occurring. Now I have very likely found the source of the issues thanks to btrfs and I can also be more confident against those issues in the future. 2. A stable memtest86+ doesn't necessarily mean a stable storage stack -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html