On Wed, Mar 26, 2025 at 11:17:24AM -0400, Damien Le Moal wrote: > On 2025/03/26 11:00, Benjamin Marzinski wrote: > > On Wed, Mar 26, 2025 at 08:55:48AM -0400, Damien Le Moal wrote: > >> On 2025/03/21 13:52, Benjamin Marzinski wrote: > >>> On Fri, Mar 21, 2025 at 08:18:16AM +0100, Christoph Hellwig wrote: > >>>> Add support for zoned device by passing through report_zoned to the > >>>> underlying read device. > >>>> > >>>> This is required to make enable xfstests xfs/311 on zoned devices. > >>> > >>> On suspend, delay_presuspend() stops delaying and it doesn't guarantee > >>> that new bios coming in will always be submitted after the delayed bios > >>> it is flushing. That can mess things up for zoned devices. I didn't > >>> check if that matters for the specific test. Setting > >>> > >>> ti->emulate_zone_append = true; > >>> > >>> would enforce write ordering, at the expense of adding a whole other > >>> layer of delays to zoned dm-delay devices. Since this isn't really > >>> useful outside of testing, I think that could be acceptable if necessary > >>> (it would require us to support table reloads of zoned devices with > >>> emulated zone append, since tests often want to change the delay). > >>> However it would probably be better to see if we can just make dm-delay > >>> preserve write ordering during a suspend. > >> > >> delay_presuspend() calls flush_delayed_bios() with flush_all == true. So > >> all > >> BIOs will be flushed in the order they are queued in the delay list, which > >> as > >> far as I can tell is the order in which the user of dm-delay issued the > >> BIOs. So > >> for writes, the order is preserved as far as I can tell. > > > > delay_presuspend() is called before we set the DMF_BLOCK_IO_FOR_SUSPEND > > bit, which will stop incoming bio from getting mapped, and also before > > lock_fs() is called. This means it's common for new bios to continue to > > come into delay_map(), while delay_presuspend() is running. The moment > > delay_presuspend() sets dc->may_delay = false, those new bios will stop > > getting queued by delay_bio(). They will get remapped immeditately to > > the underlying device. flush_delayed_bios() doesn't even get called > > until after dc->may_delay is set to false, and if there are a lot of > > bios on the delayed_bios list, flush_delayed_bios() will schedule. So, > > it's actually very common for new incoming bios to get passed to > > underlying device before all the bios on the dc->delayed_bios list do. > > > > Solving this without grabbing the dc->process_bios_lock mutex for every > > bio sent to dm-delay probably involves keeping the incoming bios going > > to dc->delayed_bios during suspend, at least until we can guarantee that > > it's empty and no bios are being flushed. > > OK. Understood. Thank you for the explanation. And the above sounds like a > rather simple solution, which does not even needs to be zone specific. > > I also think that this is orthogonal to Christoph patch and we can fix the > suspend issue on top of Christoph's patch. This is a very niche issue anyway > for > the main target use case which is fstests, since fstests does not > suspend/resume > the dm-delay device as far as I know.
I'm not sure that I would call a patch that adds a feature which doesn't work correctly in some situations orthogonal to making the feature work correctly in those situations. But I'll grant you that dm-delay is a testing target, and Christoph's patch makes in useful for more tests, as long as they don't suspend the device while there is still outstanding IO. So, sure. I agree that Christoph doesn't need to change his patch, and we can fix the suspend issue in a separate one. How that all gets merged is Mikulas's call. -Ben > > > > > -Ben > > > >> > >> -- > >> Damien Le Moal > >> Western Digital Research > > > > > -- > Damien Le Moal > Western Digital Research