On 20 Jun 2018, at 15:33, David Sterba wrote:

On Wed, Jun 20, 2018 at 07:56:10AM -0700, Chris Mason wrote:
We've been hunting the root cause of data crc errors here at FB for a while. We'd find one or two corrupted files, usually displaying crc errors without any corresponding IO errors from the storage. The bug was rare enough that we'd need to watch a large number of machines for a few days just to catch it
happening.

We're still running these patches through testing, but the fixup worker bug seems to account for the vast majority of crc errors we're seeing in the fleet. It's cleaning pages that were dirty, and creating a window where they can be
reclaimed before we finish processing the page.

I'm having flashbacks when I see 'fixup worker',

Yeah, I don't understand how so much pain can live in one little function.

and the test generic/208 does not make it better:

generic/095 [18:07:03][ 3769.317862] run fstests generic/095 at 2018-06-20 18:07:03

Hmpf, I pass both 095 and 208 here.

[ 3774.849685] BTRFS: device fsid 3acffad9-28e5-43ce-80e1-f5032e334cba devid 1 transid 5 /dev/vdb
[ 3774.875409] BTRFS info (device vdb): disk space caching is enabled
[ 3774.877723] BTRFS info (device vdb): has skinny extents
[ 3774.879371] BTRFS info (device vdb): flagging fs with big metadata feature
[ 3774.885020] BTRFS info (device vdb): checking UUID tree
[ 3775.593329] Page cache invalidation failure on direct I/O. Possible data corruption due to collision with buffered I/O!
[ 3775.596979] File: /tmp/scratch/file2 PID: 12031 Comm: kworker/1:1
[ 3776.642812] Page cache invalidation failure on direct I/O. Possible data corruption due to collision with buffered I/O!
[ 3776.645041] File: /tmp/scratch/file2 PID: 12033 Comm: kworker/3:0
[ 3776.920634] WARNING: CPU: 0 PID: 12036 at fs/btrfs/inode.c:9319 btrfs_destroy_inode+0x1d5/0x290 [btrfs]


Which warning is this in your tree? The file_write patch is more likely to have screwed up our bits and the fixup worker is more likely to have screwed up nrpages.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to