On Thu, Jan 26, 2017 at 10:36:55AM +0100, Oliver Freyermuth wrote: > Hi and thanks for the quick reply! > > Am 26.01.2017 um 10:25 schrieb Hugo Mills: > > Can you post the output of "btrfs-debug-tree -b 35028992 > > /dev/sdb1", specifically the 5 or so entries around item 243. It is > > quite likely that you have bad RAM, and the output will help confirm > > that. > > > > Since I did not find item 243 in the debug output at all, I uploaded the > complete output of the debug-tree command here: > http://pastebin.com/xM8qUnSx
It's on line 248 of the paste: 246. key (5547032576 EXTENT_ITEM 204800) block 596426752 (36403) gen 20441 247. key (5561905152 EXTENT_ITEM 184320) block 596443136 (36404) gen 20441 248. key (15606380089319694336 UNKNOWN.76 303104) block 596459520 (36405) gen 20441 249. key (5726711808 EXTENT_ITEM 524288) block 596475904 (36406) gen 20441 250. key (5820571648 EXTENT_ITEM 524288) block 350322688 (21382) gen 20427 I was wrong in my assumption: this isn't a simple bitflip. It looks like a small random write of data over the item key. That's not to say that bad hardware isn't the culprit -- it's worth checking anyway -- but it could also be a bug in... well, almost anything. It's not corruption on the disk, because that would be caught by the checksum mechanism. This data was corrupted in RAM, before it was checksummed and written to disk. That could have happened as a result of some rogue piece of kernel code writing to an incorrect address, or as a result of some _other_ memory corruption affecting an address which is then used to write something to. Looking at the data, I think this should be manually fixable, with sufficient effort (and a hex editor). Looking at the item value: >>> hex(15606380089319694336) '0xd89500014da12000' Compared to the preceding key's value: >>> hex(5561905152) '0x14b83f000' It looks like it's just the top couple of bytes in this field that are affected, so those (d8, 95) can be zeroed. The second field should clearly be EXTENT_ITEM, which is 0xa8. The offset field (the third one) looks OK to me -- the bottom byte is 0. We can probably talk you through fixing this by hand with a decent hex editor. I've done it before... > > Check and fix your hardware first. :) > > > > If it is bad RAM, then the error is likely to be a simple bitflip, > > and there are patches for btrfs check which will fix those in most > > cases. > > I'll schedule a memcheck as soon as I can turn off the machine for a while, > which sadly may be a week or so in the future from now... Bear in mind that if it is unreliable hardware, then continued use of the FS in read-write operation is likely to cause additional damage. Hugo. -- Hugo Mills | This: Rock. You throw rock. hugo@... carfax.org.uk | http://carfax.org.uk/ | PGP: E2AB1DE4 | Graeme Swann on fast bowlers
signature.asc
Description: Digital signature