On Thu, Mar 4, 2021 at 8:35 AM Sebastian Roller <sebastian.rol...@gmail.com> wrote: > > > I don't know. The exact nature of the damage of a failing controller > > is adding a significant unknown component to it. If it was just a > > matter of not writing anything at all, then there'd be no problem. But > > it sounds like it wrote spurious or corrupt data, possibly into > > locations that weren't even supposed to be written to. > > Unfortunately I cannot figure out exactly what happened. Logs end > Friday night while the backup script was running -- which also > includes a finalizing balancing of the device. Monday morning after > some exchange of hardware the machine came up being unable to mount > the device.
It's probably not discernible with logs anyway. What hardware does when it goes berserk? It's chaos. And all file systems have write order requirements. It's fine if at a certain point writes just abruptly stop going to stable media. But if things are written out of order, or if the hardware acknowledges critical metadata writes are written but were actually dropped, it's bad. For all file systems. > OK -- I now had the chance to temporarily switch to 5.11.2. Output > looks cleaner, but the error stays the same. > > root@hikitty:/mnt$ mount -o ro,rescue=all /dev/sdi1 hist/ > > [ 3937.815083] BTRFS info (device sdi1): enabling all of the rescue options > [ 3937.815090] BTRFS info (device sdi1): ignoring data csums > [ 3937.815093] BTRFS info (device sdi1): ignoring bad roots > [ 3937.815095] BTRFS info (device sdi1): disabling log replay at mount time > [ 3937.815098] BTRFS info (device sdi1): disk space caching is enabled > [ 3937.815100] BTRFS info (device sdi1): has skinny extents > [ 3938.903454] BTRFS error (device sdi1): bad tree block start, want > 122583416078336 have 0 > [ 3938.994662] BTRFS error (device sdi1): bad tree block start, want > 99593231630336 have 0 > [ 3939.201321] BTRFS error (device sdi1): bad tree block start, want > 124762809384960 have 0 > [ 3939.221395] BTRFS error (device sdi1): bad tree block start, want > 124762809384960 have 0 > [ 3939.221476] BTRFS error (device sdi1): failed to read block groups: -5 > [ 3939.268928] BTRFS error (device sdi1): open_ctree failed This looks like a super is expecting something that just isn't there at all. If spurious behavior lasted only briefly during the hardware failure, there's a chance of recovery. But this diminishes greatly if the chaotic behavior was on-going for a while, many seconds or a few minutes. > I still hope that there might be some error in the fs created by the > crash, which can be resolved instead of real damage to all the data in > the FS trees. I used a lot of snapshots and deduplication on that > device, so that I expect some damage by a hardware error. But I find > it hard to believe that every file got damaged. Correct. They aren't actually damaged. However, there's maybe 5-15 MiB of critical metadata on Btrfs, and if it gets corrupt, the keys to the maze are lost. And it becomes difficult, sometimes impossible, to "bootstrap" the file system. There are backup entry points, but depending on the workload, they go stale in seconds to a few minutes, and can be subject to being overwritten. When 'btrfs restore' is doing partial recovery that ends up with a lot of damage and holes tells me it's found stale parts of the file system - it's on old rails so to speak, there's nothing available to tell it that this portion of the tree is just old and not valid anymore (or only partially valid), but also the restore code is designed to be more tolerant of errors because otherwise it would just do nothing at all. I think if you're able to find the most recent root node for a snapshot you want to restore, along with an intact chunk tree it should be possible to get data out of that snapshot. The difficulty is finding it, because it could be almost anywhere. OK so you said there's an original and backup file system, are they both in equally bad shape, having been on the same controller? Are they both btrfs? What do you get for btrfs insp dump-s -f /dev/sdXY There might be a backup tree root in there that can be used with btrfs restore -t Also, sometimes easier to do this on IRC on freenode.net in the channel #btrfs -- Chris Murphy