On Wed, Sep 20, 2017 at 02:53:57PM +0200, David Sterba wrote: > On Tue, Sep 19, 2017 at 10:12:39AM -0600, Liu Bo wrote: > > On Tue, Sep 19, 2017 at 05:07:25PM +0200, David Sterba wrote: > > > On Tue, Sep 19, 2017 at 11:32:46AM +0000, Paul Jones wrote: > > > > > This 'mirror 0' looks fishy, (as mirror comes from > > > > > btrfs_io_bio->mirror_num, > > > > > which should be at least 1 if raid1 setup is in use.) > > > > > > > > > > Not sure if 4.13.2-gentoo made any changes on btrfs, but can you > > > > > please > > > > > verify with the upstream kernel, say, v4.13? > > > > > > > > It's basically a vanilla kernel with a handful of unrelated patches. > > > > The filesystem fell apart overnight, there were a few thousand > > > > checksum errors and eventually it went read-only. I tried to remount > > > > it, but got open_ctree failed. Btrfs check segfaulted, lowmem mode > > > > completed with so many errors I gave up and will restore from the > > > > backup. > > > > > > > > I think I know the problem now - the lvm cache was in writeback mode > > > > (by accident) so during a defrag there would be gigabytes of unwritten > > > > data in memory only, which was all lost when the system crashed > > > > (motherboard failure). No wonder the filesystem didn't quite survive. > > > > > > Yeah, the caching layer was my first suspicion, and lack of propagating > > > of the barriers. Good that you were able to confirm that as the root > > > cause. > > > > > > > I must say though, I'm seriously impressed at the data integrity of > > > > BTRFS - there were near 10,000 checksum errors, 4 which were > > > > uncorrectable, and from what I could tell nearly all of the data was > > > > still intact according to rsync checksums. > > > > > > Yay! > > > > But still don't get why mirror_num is 0, do you have an idea on how > > does writeback cache make that? > > My first idea was that the cached blocks were zeroed, so we'd see the ino > and mirror as 0. But this is not correct as the blocks would not pass > the checksum tests, so the blocks must be from some previous generation. > Ie. the transid verify failure. And all the error reports appear after > that so I'm slightly suspicious about the way it's actually reported. > > btrfs_print_data_csum_error takes mirror from either io_bio or > compressed_bio structures, so there might be a case when the structures > are initialized. If the transid check is ok, then the structures are > updated. If the check fails we'd see the initial mirror number. All of > that is just a hypothesis, I haven't checked with the code. >
Thanks a lot for the input, you're right, mirror_num 0 should come from compressed read where it doesn't record the bbio->mirror_num but the mirror passing from the upper layer, and it's not metadata as we don't yet compress metadata, so this all makes sense. I think it also disables the ability of read-repair from raid1 for compressed data, and that's what caused the bug where it hits BUG_ON(mirror_num == 0) in cleanup_io_failure(). The good news is that I can reproduce it, will send a patch and a testcase. > I don't have a theoretical explanation for the ino 0. The inode pointer > that goes to btrfs_print_data_csum_error should be from a properly > initialized inode and we print the number using btrfs_ino. That will use > the vfs i_ino value and we should never get 0 out of that. ino 0 comes from metadata read-repair, some cleanup may be needed to make it less confusing. thanks, -liubo -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html