* Dr. David Alan Gilbert ([email protected]) wrote: > * Keith Busch ([email protected]) wrote: > > On Mon, Jun 15, 2026 at 04:16:12PM -0700, Vjaceslavs Klimovs wrote: > > > Your trace looks like what the two earlier reports hit: a read reaching > > > a leaf device with sectors > 0 but phys_seg 0 (an empty bio). One aside > > > that may help read the trace: blk_io_trace.error is a __u16, so the > > > bracketed values on your C lines are errnos as u16 (65514 = -EINVAL, > > > 65531 = -EIO). > > > > > > The WARN itself is new, the bad bio isn't. bio_add_page() only started > > > rejecting len == 0 in 643893647cac ("block: reject zero length in > > > bio_add_page()", v7.1-rc1); on 7.0.8 the same empty bio tripped > > > scsi_alloc_sgtables()'s !nr_segs instead, which matches what you saw. > > > That fits your "not a recent regression": the condition is older, v7.1 > > > just made it loud. > > > > > > For Tomas's and my reports (QEMU O_DIRECT to the LV block device) the > > > origin looks like 5ff3f74e145a ("block: simplify direct io validity > > > check", v6.18): blkdev_dio_invalid() now checks only aggregate > > > ki_pos | count alignment and dropped the per-segment > > > bdev_iter_is_aligned() walk, so a degenerate or misaligned O_DIRECT no > > > longer gets -EINVAL at the fops boundary. But your reproducer reads a > > > file, which goes through the filesystem O_DIRECT path and never calls > > > blkdev_dio_invalid(), and still makes the empty bio. So it isn't only > > > that one entry point. > > > > > > dm-mirror then hangs because Keith's f7b24c7b41f2 only covers md > > > raid1/raid10; legacy dm-mirror (dm-raid1.c) has no equivalent and > > > rebuilds the empty read onto the other leg. Note the leg's status isn't > > > even consistent (your SATA path returns BLK_STS_IOERR, not > > > BLK_STS_INVAL), so copying that status check into dm-mirror probably > > > wouldn't catch every case. > > > > > > For what it's worth, that points me toward rejecting the empty or > > > misaligned bio once, at submission, with -EINVAL, rather than teaching > > > each consumer to tolerate it. But you'll know the tradeoffs far better > > > than I do. > > > > > > I have a small QEMU + LVM raid1/mirror setup that reproduces the > > > block-device variant and bisects to 5ff3f74e. Happy to run your file > > > reproducer with some instrumentation at the dm-mirror read entry > > > (bi_size vs bio_sectors vs bvec lengths) to see whether the bio is > > > already empty on arrival or built that way on the retry, and to test > > > any patch. > > > > Thanks for following up here. I didn't initially see your follow-up > > until Thorsten linked it. I apologize for missing that, this feature is > > important so I don't want to see anything regress for it. > > > > There is a known bug fix I think future tests should include: > > > > > > https://lore.kernel.org/linux-block/[email protected]/ > > > This likely isn't the fix you're looking for, but including it rules out > > conditions that are not important here. > > > > After that, can we try this suggestion and see if the hang goes away? > > > > https://lore.kernel.org/linux-block/ajBb8tK-0aJBpIgF@kbusch-mbp/ > > With just that one in, the machine survives - thanks! > > It does give: > > [ 505.208354] device-mapper: raid1: Mirror read failed from 252:24. Trying > alternative device. > [ 505.239376] device-mapper: raid1: All sides of mirror have failed. > [ 505.239389] device-mapper: raid1: Read failure on mirror device 252:25. > Failing I/O. > [ 505.239394] device-mapper: raid1: Mirror read failed. > > Although as far as I can tell the RAID hasn't errored and is still in sync. > > If I turn the test case into a write (just s/pread/pwrite/ ) - the machine > still survives but then it does lose raid sync, and the raid resync > seems to stick until I do a 'lvchange --refresh main/lvol0' > which recovers after having spat out a: > > [ 865.319527] Buffer I/O error on dev dm-26, logical block 262128, async > page read > > > I expect the original test case to still return an error (and I think it > > was designed to), but it shouldn't produce the warn or bug splats with a > > stuck uninterruptable task. > > It's not clear to me if it was designed to fail or not; I've not had > a chance to rerun the original qemu block tests yet, and I don't know > if old kernels succesfully used O_DIRECT in this case. > > It still feels that my pwrite case above shouldn't cause a raid de-sync > (especially since a normal user can do it).
Just to follow up on that; if I use the modern lvm mode ( lvcreate -m 1 -L 1G main /dev/sda2 /dev/sdb2 ) rather than the old mirror with the same patch, then: a) I get no log errors with either read or write b) read still gives EIO c) write apparently succeeds ?! Dave > Dave > -- > -----Open up your eyes, open up your mind, open up your code ------- > / Dr. David Alan Gilbert | Running GNU/Linux | Happy \ > \ dave @ treblig.org | | In Hex / > \ _________________________|_____ http://www.treblig.org |_______/ -- -----Open up your eyes, open up your mind, open up your code ------- / Dr. David Alan Gilbert | Running GNU/Linux | Happy \ \ dave @ treblig.org | | In Hex / \ _________________________|_____ http://www.treblig.org |_______/
