Hi,
I was using kernel v4.19.48 and found that it cannot pass the generic/538 on
xfs. The error output is as follows:
FSTYP -- xfs (non-debug)
PLATFORM -- Linux/x86_64 alinux2-6 4.19.48
MKFS_OPTIONS -- -f -bsize=4096 /dev/vdc
MOUNT_OPTIONS -- /dev/vdc /mnt/testarea/scra
generic/538 0s ... - output mismatch (see
/root/usr/local/src/xfstests/results//generic/538.out.bad)
--- tests/generic/538.out 2019-05-27 13:57:06.505666465 +0800
+++ /root/usr/local/src/xfstests/results//generic/538.out.bad
2019-06-05 16:43:14.702002326 +0800
@@ -1,2 +1,10 @@
QA output created by 538
+Data verification fails
+Find corruption
+00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
................
+*
+00000200 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
ZZZZZZZZZZZZZZZZ
+00002000
...
(Run 'diff -u /root/usr/local/src/xfstests/tests/generic/538.out
/root/usr/local/src/xfstests/results//generic/538.out.bad' to see the entire
diff)
Ran: generic/538
Failures: generic/538
Failed 1 of 1 tests
I also found that the latest kernel (v5.2.0-rc2) of upstream can pass the
generic/538 test. Therefore, I bisected and found the first good commit is
3110fc79606. This commit adds the hardware queue into the sort function.
Besides, the sort function returns a negative value when the offset and queue
(software and hardware) of two I/O requests are same. I think the second part
of the change make senses. The kernel should not change the relative position
of two I/O requests when their offset and queue are same. So I made the
following changes and merged it into the kernel 4.19.48. After the
modification, we can pass the generic/538 test on xfs. The same case can be
passed on ext4, since ext4 has corresponding fix 0db24122bd7f ("ext4: fix data
corruption caused by overlapping unaligned and aligned IO"). Though I think xfs
should be responsible for this issue, the block layer code below is also
problematic. Any ideas?
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 4e563ee..a7309cd 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1610,7 +1610,7 @@ static int plug_ctx_cmp(void *priv, struct list_head *a,
struct list_head *b)
return !(rqa->mq_ctx < rqb->mq_ctx ||
(rqa->mq_ctx == rqb->mq_ctx &&
- blk_rq_pos(rqa) < blk_rq_pos(rqb)));
+ blk_rq_pos(rqa) <= blk_rq_pos(rqb)));
}
void blk_mq_flush_plug_list(struct blk_plug *plug, bool from_schedule)
Best regards,
Alvin