On Tue, 15 Apr 2025, Damien Le Moal wrote:
> > Hi
> >
> > I looked at the generic device mapper code and it seems that ordering of
> > write bios is not guaranteed with any target in case of suspend/resume.
> >
> > * we suspend the device:
> > * received bios are added to md->deferred in queue_io
> >
> > * we resume the device:
> > * __dm_resume calls dm_queue_flush
> > * dm_queue_flush clears DMF_BLOCK_IO_FOR_SUSPEND and submits work item
> > &md->work (dm_wq_work)
> > * dm_resume clears DMF_SUSPENDED
> > * the device starts accepting new bios in dm_submit_bio
> > * dm_wq_work runs concurrently with new bios that are received, so
> > ordering of bios is not preserved
> >
> > So it doesn't make much sense to try to fix it in dm-delay, if it isn't
> > supposed to work at all.
>
> Just need to fix the generic DM resume code then. This patch fixing dm-delay
> is
> still relevant even with DM generic resume fixes.
>
> I can resend the dm-delay fix together with DM core resume fixes. And Benjamin
> can re-send the dm-delay kthread timer cleanup independently (I will rebase)
> or
> on top of that fix series. Does that work for you ?
I would like to know why is this needed. If you have a zoned device, you
can send one big write bio, wait for the big bio to finish, send another
big write bio, wait for it to finish and so on. Then, there will be at
most one write bio oustanding and you don't have to care about kernel
reordering in-flight bios.
It seems that you want to send many small overlapping write bios - the
question is why? Why can't the application accumulate the content and send
it as one big bio?
I'm a bit worried that supporting this ordering will just bloat the kernel
with marginal benefit.
Mikulas