On Wed, Sep 20, 2017 at 02:53:57PM +0200, David Sterba wrote:
> On Tue, Sep 19, 2017 at 10:12:39AM -0600, Liu Bo wrote:
> > On Tue, Sep 19, 2017 at 05:07:25PM +0200, David Sterba wrote:
> > > On Tue, Sep 19, 2017 at 11:32:46AM +0000, Paul Jones wrote:
> > > > > This 'mirror 0' looks fishy, (as mirror comes from 
> > > > > btrfs_io_bio->mirror_num,
> > > > > which should be at least 1 if raid1 setup is in use.)
> > > > > 
> > > > > Not sure if 4.13.2-gentoo made any changes on btrfs, but can you 
> > > > > please
> > > > > verify with the upstream kernel, say, v4.13?
> > > > 
> > > > It's basically a vanilla kernel with a handful of unrelated patches.
> > > > The filesystem fell apart overnight, there were a few thousand
> > > > checksum errors and eventually it went read-only. I tried to remount
> > > > it, but got open_ctree failed. Btrfs check segfaulted, lowmem mode
> > > > completed with so many errors I gave up and will restore from the
> > > > backup.
> > > > 
> > > > I think I know the problem now - the lvm cache was in writeback mode
> > > > (by accident) so during a defrag there would be gigabytes of unwritten
> > > > data in memory only, which was all lost when the system crashed
> > > > (motherboard failure). No wonder the filesystem didn't quite survive.
> > > 
> > > Yeah, the caching layer was my first suspicion, and lack of propagating
> > > of the barriers. Good that you were able to confirm that as the root 
> > > cause.
> > > 
> > > > I must say though, I'm seriously impressed at the data integrity of
> > > > BTRFS - there were near 10,000 checksum errors, 4 which were
> > > > uncorrectable, and from what I could tell nearly all of the data was
> > > > still intact according to rsync checksums.
> > > 
> > > Yay!
> > 
> > But still don't get why mirror_num is 0, do you have an idea on how
> > does writeback cache make that?
> 
> My first idea was that the cached blocks were zeroed, so we'd see the ino
> and mirror as 0. But this is not correct as the blocks would not pass
> the checksum tests, so the blocks must be from some previous generation.
> Ie. the transid verify failure. And all the error reports appear after
> that so I'm slightly suspicious about the way it's actually reported.
> 
> btrfs_print_data_csum_error takes mirror from either io_bio or
> compressed_bio structures, so there might be a case when the structures
> are initialized. If the transid check is ok, then the structures are
> updated. If the check fails we'd see the initial mirror number. All of
> that is just a hypothesis, I haven't checked with the code.
>

Thanks a lot for the input, you're right, mirror_num 0 should come
from compressed read where it doesn't record the bbio->mirror_num but
the mirror passing from the upper layer, and it's not metadata as we
don't yet compress metadata, so this all makes sense.

I think it also disables the ability of read-repair from raid1 for
compressed data, and that's what caused the bug where it hits
BUG_ON(mirror_num == 0) in cleanup_io_failure().

The good news is that I can reproduce it, will send a patch and a
testcase.

> I don't have a theoretical explanation for the ino 0. The inode pointer
> that goes to btrfs_print_data_csum_error should be from a properly
> initialized inode and we print the number using btrfs_ino. That will use
> the vfs i_ino value and we should never get 0 out of that.

ino 0 comes from metadata read-repair, some cleanup may be needed to
make it less confusing.

thanks,
-liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to