On 2025/03/26 11:00, Benjamin Marzinski wrote:
> On Wed, Mar 26, 2025 at 08:55:48AM -0400, Damien Le Moal wrote:
>> On 2025/03/21 13:52, Benjamin Marzinski wrote:
>>> On Fri, Mar 21, 2025 at 08:18:16AM +0100, Christoph Hellwig wrote:
>>>> Add support for zoned device by passing through report_zoned to the
>>>> underlying read device.
>>>>
>>>> This is required to make enable xfstests xfs/311 on zoned devices.
>>>
>>> On suspend, delay_presuspend() stops delaying and it doesn't guarantee
>>> that new bios coming in will always be submitted after the delayed bios
>>> it is flushing. That can mess things up for zoned devices. I didn't
>>> check if that matters for the specific test. Setting
>>>
>>> ti->emulate_zone_append = true;
>>>
>>> would enforce write ordering, at the expense of adding a whole other
>>> layer of delays to zoned dm-delay devices. Since this isn't really
>>> useful outside of testing, I think that could be acceptable if necessary
>>> (it would require us to support table reloads of zoned devices with
>>> emulated zone append, since tests often want to change the delay).
>>> However it would probably be better to see if we can just make dm-delay
>>> preserve write ordering during a suspend.
>>
>> delay_presuspend() calls flush_delayed_bios() with flush_all == true. So all
>> BIOs will be flushed in the order they are queued in the delay list, which as
>> far as I can tell is the order in which the user of dm-delay issued the 
>> BIOs. So
>> for writes, the order is preserved as far as I can tell.
> 
> delay_presuspend() is called before we set the DMF_BLOCK_IO_FOR_SUSPEND
> bit, which will stop incoming bio from getting mapped, and also before
> lock_fs() is called. This means it's common for new bios to continue to
> come into delay_map(), while delay_presuspend() is running.  The moment
> delay_presuspend() sets dc->may_delay = false, those new bios will stop
> getting queued by delay_bio(). They will get remapped immeditately to
> the underlying device. flush_delayed_bios() doesn't even get called
> until after dc->may_delay is set to false, and if there are a lot of
> bios on the delayed_bios list, flush_delayed_bios() will schedule. So,
> it's actually very common for new incoming bios to get passed to
> underlying device before all the bios on the dc->delayed_bios list do.
> 
> Solving this without grabbing the dc->process_bios_lock mutex for every
> bio sent to dm-delay probably involves keeping the incoming bios going
> to dc->delayed_bios during suspend, at least until we can guarantee that
> it's empty and no bios are being flushed.

OK. Understood. Thank you for the explanation. And the above sounds like a
rather simple solution, which does not even needs to be zone specific.

I also think that this is orthogonal to Christoph patch and we can fix the
suspend issue on top of Christoph's patch. This is a very niche issue anyway for
the main target use case which is fstests, since fstests does not suspend/resume
the dm-delay device as far as I know.

> 
> -Ben
> 
>>
>> -- 
>> Damien Le Moal
>> Western Digital Research
> 


-- 
Damien Le Moal
Western Digital Research

Reply via email to