On 4/10/25 4:54 PM, Christoph Hellwig wrote:
> On Thu, Apr 10, 2025 at 01:33:11PM +0900, Damien Le Moal wrote:
>> When a dm-delay devie is being suspended, the .presuspend() operation is
> 
> s/devie/device/
> 
>> first executed (delay_presuspend()) to immediately issue all the BIOs
>> present in the delayed list of the device and also sets the device
>> may_delay boolean to false. At the same time, any new BIO is issued to
>> the device will not be delayed and immediately issued with delay_bio()
>> returning DM_MAPIO_REMAPPED. This creates a situation where potentially
>> 2 different contexts may be issuing write BIOs to the same zone of a
>> zone device without respecting the issuing order from the user, that is,
>> a newly issued write BIO may be issued before other write BIOs for the
>> same target zone that are in the device delayed list. If such situation
>> occurs, write BIOs may be failed by the underlying zoned device due to
>> an unaligned write error.
>>
>> Prevent this situation from happening by always handling newly issued
>> write BIOs using the delayed list of BIOs, even when the device is being
>> suspended. This is done by forcing the use of the worker kthread for
>> zoned devices, and by modifying flush_worker_fn() to always flush all
>> delayed BIOs if the device may_delay boolean is false.
> 
> Is that scenario specific to dm-delay?  I think the same applies to
> any other target that supports passing on bios to zoned devices.

Don't think so. The delayed BIO list is a dm-delay thing only. dm-linear,
dm-error or dm-crypt do not delay BIOs like dm-delay so we do not have BIOs
being issued in the pre-suspend context.

> 
>>      spin_lock(&dc->delayed_bios_lock);
>>      if (unlikely(!dc->may_delay)) {
>> -            spin_unlock(&dc->delayed_bios_lock);
>> -            return DM_MAPIO_REMAPPED;
>> +            /*
>> +             * Issue the BIO immediately if the device is not zoned. FOr a
>> +             * zoned device, preserver the ordering of write operations by
>> +             * using the delay list.
>> +             */
>> +            if (!bdev_is_zoned(c->dev->bdev) || c != &dc->write) {
>> +                    spin_unlock(&dc->delayed_bios_lock);
>> +                    return DM_MAPIO_REMAPPED;
> 
> Don't we only need to do that if anything is queued up due to a
> suspension?  Then again having a different patch might be premature
> optimization, it's not like anyone cares about dm-delay performance
> almost by definition :)

True. I can add a list_empty check here. Will send a v2 with that.


-- 
Damien Le Moal
Western Digital Research

Reply via email to