Yes, that was exactly it. The patch works for raid1 logical volumes but, for obvious reasons (these are dm raid) this still oopses on legacy mirror logical volumes:
[ 2.168054] device-mapper: raid1: Mirror read failed from 252:0. Trying alternative device. [ 2.169241] BUG: unable to handle page fault for address: fffff580045f4bc8 [ 2.170256] #PF: supervisor read access in kernel mode [ 2.170997] #PF: error_code(0x0000) - not-present page [ 2.171706] PGD 7ff9d067 P4D 7ff9d067 PUD 7ff9c067 PMD 0 [ 2.172433] Oops: Oops: 0000 [#1] SMP PTI [ 2.173003] CPU: 0 UID: 0 PID: 11 Comm: kworker/0:1 Not tainted 6.18.29+ #19 PREEMPT(lazy) [ 2.174118] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-20250108_150619-localhost 04/01/2014 [ 2.175472] Workqueue: kmirrord do_mirror [ 2.176040] RIP: 0010:bio_add_page+0x8c/0x340 [ 2.176676] Code: 07 4d 8b 48 08 41 f6 c1 01 0f 85 d6 01 00 00 0f 1f 44 00 00 4d 89 c1 49 8b 11 48 c1 ea 33 83 e2 07 83 fa 04 0f 84 bf 00 00 00 <48> 8b 56 08 4c 8d 4a ff f6 c2 01 75 08 0f 1f 44 00 00 49 89 f1 49 [ 2.179169] RSP: 0018:ffffcea500063bc8 EFLAGS: 00010293 [ 2.179933] RAX: 0000000000000001 RBX: ffff8d53149af400 RCX: 0000000000000580 [ 2.180947] RDX: 0000000000000001 RSI: fffff580045f4bc0 RDI: ffff8d53149af488 [ 2.181969] RBP: 0000000000000000 R08: fffff580005f4c00 R09: fffff580005f4c00 [ 2.182978] R10: ffffcea500063c14 R11: 0000000000000a80 R12: ffff8d5303192a80 [ 2.183997] R13: ffffcea500063c20 R14: 0000000000000001 R15: ffffcea500063cf8 [ 2.185022] FS: 0000000000000000(0000) GS:ffff8d53ed4d5000(0000) knlGS:0000000000000000 [ 2.186180] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2.187035] CR2: fffff580045f4bc8 CR3: 0000000002c44002 CR4: 0000000000372ef0 [ 2.188047] Call Trace: [ 2.188417] <TASK> [ 2.188756] do_region+0x21d/0x270 [ 2.189313] dispatch_io+0xf1/0x150 [ 2.189832] ? __pfx_bio_get_page+0x10/0x10 [ 2.190424] ? __pfx_bio_next_page+0x10/0x10 [ 2.191046] dm_io+0x136/0x240 [ 2.191503] ? __pfx_read_callback+0x10/0x10 [ 2.192108] ? __pfx_bio_get_page+0x10/0x10 [ 2.192708] ? __pfx_bio_next_page+0x10/0x10 [ 2.193319] do_reads+0x13e/0x210 [ 2.193807] ? __pfx_read_callback+0x10/0x10 [ 2.194411] do_mirror+0x117/0x2a0 [ 2.194912] process_one_work+0x18d/0x340 [ 2.195508] worker_thread+0x196/0x300 [ 2.196022] ? __pfx_worker_thread+0x10/0x10 [ 2.196617] kthread+0xfc/0x240 [ 2.197073] ? __pfx_kthread+0x10/0x10 [ 2.197606] ? __pfx_kthread+0x10/0x10 [ 2.198116] ret_from_fork+0x158/0x170 [ 2.198645] ? __pfx_kthread+0x10/0x10 [ 2.199161] ret_from_fork_asm+0x1a/0x30 [ 2.199736] </TASK> [ 2.200053] Modules linked in: [ 2.200493] CR2: fffff580045f4bc8 [ 2.200951] ---[ end trace 0000000000000000 ]--- [ 2.201599] RIP: 0010:bio_add_page+0x8c/0x340 [ 2.202193] Code: 07 4d 8b 48 08 41 f6 c1 01 0f 85 d6 01 00 00 0f 1f 44 00 00 4d 89 c1 49 8b 11 48 c1 ea 33 83 e2 07 83 fa 04 0f 84 bf 00 00 00 <48> 8b 56 08 4c 8d 4a ff f6 c2 01 75 08 0f 1f 44 00 00 49 89 f1 49 [ 2.204690] RSP: 0018:ffffcea500063bc8 EFLAGS: 00010293 [ 2.205390] RAX: 0000000000000001 RBX: ffff8d53149af400 RCX: 0000000000000580 [ 2.206368] RDX: 0000000000000001 RSI: fffff580045f4bc0 RDI: ffff8d53149af488 [ 2.207333] RBP: 0000000000000000 R08: fffff580005f4c00 R09: fffff580005f4c00 [ 2.208297] R10: ffffcea500063c14 R11: 0000000000000a80 R12: ffff8d5303192a80 [ 2.209257] R13: ffffcea500063c20 R14: 0000000000000001 R15: ffffcea500063cf8 [ 2.210265] FS: 0000000000000000(0000) GS:ffff8d53ed4d5000(0000) knlGS:0000000000000000 [ 2.211391] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2.212201] CR2: fffff580045f4bc8 CR3: 0000000002c44002 CR4: 0000000000372ef0 [ 2.213196] Kernel panic - not syncing: Fatal exception [ 2.214313] Kernel Offset: 0xc200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 2.215981] Rebooting in 10 seconds.. On Fri, May 15, 2026 at 10:10 PM Thorsten Leemhuis <[email protected]> wrote: > > On 5/15/26 18:52, Vjaceslavs Klimovs wrote: > > Summary > > ------- > > On v6.18, starting a libvirt/QEMU guest with virtio-blk backed by an > > LVM "--type raid1" LV (drivers/md/dm-raid.c stacked on > > drivers/md/raid1.c) makes md/raid1 register read failures at LV > > sector 0 within seconds of "virsh start" and mark rimage_0 Faulty > > once max_corrected_read_errors (default 20) is exceeded. Reads > > succeed via the redirect path so guests boot, but every guest disk > > ends up degraded on every VM start. Same workload on legacy > > "--type mirror" (drivers/md/dm-raid1.c) crashes the host: a > > zero-length READ reaches the NVMe controller, is rejected with > > "Invalid Field in Command", and the dm-mirror recovery path oopses. > > That sounds somewhat like > https://lore.kernel.org/all/2982107.4sosBPzcNG@electra/ > > Have you tried latest 7.1-rc? It contains a fix for the problem > mentioned in said thread: f7b24c7b41f23b ("md/raid1,raid10: don't fail > devices for invalid IO errors") [v7.1-rc2] > > Ciao, Thorsten > > > Symptom on dm-raid raid1 (post --type raid1) > > -------------------------------------------- > > Per LV, at virsh start, in host dmesg: > > > > kernel: raid1_end_read_request: 95 callbacks suppressed > > kernel: raid1_read_request: 95 callbacks suppressed > > kernel: md/raid1:mdX: dm-58: rescheduling sector 0 > > kernel: md/raid1:mdX: redirecting sector 0 to other mirror: dm-58 > > kernel: md/raid1:mdX: dm-58: rescheduling sector 0 > > kernel: md/raid1:mdX: redirecting sector 0 to other mirror: dm-58 > > [... 10 rescheduling/redirecting pairs ...] > > kernel: md/raid1:mdX: dm-58: Raid device exceeded read_error > > threshold [cur 21:max 20] > > kernel: md/raid1:mdX: dm-58: Failing raid device > > kernel: md/raid1:mdX: Disk failure on dm-58, disabling device. > > kernel: md/raid1:mdX: Operation continuing on 1 devices. > > > > dmeventd: WARNING: Device #0 of raid1 array, vg0-iris_boot, has failed. > > dmeventd: WARNING: Waiting for resynchronization to finish before > > initiating repair on RAID device vg0-iris_boot. > > dmeventd: Use 'lvconvert --repair vg0/iris_boot' to replace failed device. > > > > Subsequent "lvs -a": > > > > WARNING: RaidLV vg0/iris_boot needs to be refreshed! > > See character 'r' at position 9 in the RaidLV's attributes and its > > SubLV(s). > > > > dmesg | grep nvme is EMPTY on this path. The NVMe driver is not > > involved in producing the error; the failure originates between the > > virtio-blk bio submission and raid1_end_read_request(). > > > > Symptom on legacy dm-mirror (pre-conversion --type mirror) > > ---------------------------------------------------------- > > Same workload on drivers/md/dm-raid1.c reaches the NVMe controller > > as a zero-length READ and panics the host through dm-mirror's > > recovery path: > > > > kernel: operation not supported error, dev nvme1n1, sector 935446535 > > op 0x0:(READ) flags 0x0 phys_seg 0 prio class 2 > > kernel: nvme1n1: I/O Cmd(0x2) @ LBA 935446535, 0 blocks, I/O Error > > (sct 0x0 / sc 0x2) > > [... 10+ identical bursts at same timestamp ...] > > dmeventd: Primary mirror device 252:58 read failed. > > dmeventd: vg0-iris_boot is now in-sync. > > [kernel oops in dm_mirror recovery path, full trace lost to console flash] > > > > The "phys_seg 0", "0 blocks", "sct 0x0/sc 0x2" trio (NVMe Generic, > > Invalid Field in Command, NVMe spec 4.1.1.2) is unambiguous: a bio > > with bi_iter.bi_size == 0 and bi_vcnt == 0 left the block layer and > > hit the controller. dm-raid raid1 hides this by retrying on the > > surviving leg, but the upstream-of-md trigger is identical. > > > > Bisect > > ------ > > git bisect, v6.12..v6.18, 16 deterministic GOOD/BAD steps, no skips, > > ~104 minutes: > > > > 5ff3f74e145adc79b49668adb8de276446acf6be is the first bad commit > > block: simplify direct io validity check > > > > --- a/block/fops.c > > +++ b/block/fops.c > > @@ -38,8 +38,8 @@ static blk_opf_t dio_bio_write_op(struct kiocb *iocb) > > static bool blkdev_dio_invalid(struct block_device *bdev, struct kiocb > > *iocb, > > struct iov_iter *iter) > > { > > - return iocb->ki_pos & (bdev_logical_block_size(bdev) - 1) || > > - !bdev_iter_is_aligned(bdev, iter); > > + return (iocb->ki_pos | iov_iter_count(iter)) & > > + (bdev_logical_block_size(bdev) - 1); > > } > > > > The dropped bdev_iter_is_aligned() used to walk the iov_iter and > > reject per-segment misaligned/degenerate vectors at the blkdev fops > > entry point. The replacement only validates ki_pos and total length > > against the logical block size. Cases that now pass that no longer > > get rejected: > > > > - iter with iov_iter_count(iter) == 0 (degenerate; total length is > > "sector-aligned" since 0 % 512 == 0) > > - iter where total length is sector-aligned but a segment isn't > > > > The commit message justifies the removal with "The block layer > > checks all the segments for validity later". This is true for the > > io_uring submit path (which enters __blkdev_direct_IO directly and > > does its own validation) but not for the libaio aio_read/write_iter > > or the worker-pool sync read/write_iter paths that enter via > > blkdev_{read,write}_iter() -> blkdev_dio_invalid(). For those paths, > > the segment check has no replacement. > > > > Reproducing > > ---------------------------------------------------------- > > > > The trigger requires QEMU virtio-blk's specific submission shape AND > > a non-io_uring submit. Userspace libaio alone, userspace > > preadv-in-a-thread alone, and QEMU's raw-driver open probes (which > > qemu-img info exercises identically) are all insufficient. The > > combination that hits the bug is "guest-driven I/O through > > virtio-blk-pci with cache.direct=on and aio in {native, threads}". > > > > #regzbot introduced: 5ff3f74e145adc79b49668adb8de276446acf6be > > > > Thanks, > > Vjaceslavs Klimovs > > >

