On 4/22/25 20:27, Mikulas Patocka wrote:
> 
> 
> On Fri, 18 Apr 2025, Damien Le Moal wrote:
> 
>>> It seems that you want to send many small overlapping write bios - the 
>>> question is why? Why can't the application accumulate the content and send 
>>> it as one big bio?
>>
>> That is the application problem. On HDDs at least, small IOs will hurt
>> performance. SMR or not, same problem. Intellignet applications will try to
>> shape their workload to optimize performance. But that point is irrelevant 
>> here.
>> The kernel porvides a service: process write requests, regardless of how big
>> these requests are, if they are correct (i.e. for zoned devices, they must be
>> issued in order by the user), we must correctly execute the writes.
>>
>>> I'm a bit worried that supporting this ordering will just bloat the kernel 
>>> with marginal benefit.
>>
>> Bloat ?
> 
> We would need three states instead of two: normal, suspended, resuming (so 
> it would bloat all the device mapper logic with another state). There's 
> dm_wq_work using submit_bio_noacct, which wouldn't work, as it would 
> immediatelly enqueue the bio for suspend again, so we would need some 
> other path to submit the bio.
> 
> dm_wq_work would have to transition the device from the "resuming" state 
> to the "normal" state when it processes all the bios, but it is called for 
> various other reasons too.

It is not because you do not see a clean solution that there is not one. So
unless you have completely made up your mind already and are not willing to
accept any change in this area to improve things, I will dig into this and find
a solution that is not "bloat".

>> everything is already in place to preserve the order of write operations
>> to zoned devices, since a long time ago.
> 
> What if the controller doesn't preserve the order of writes? I think that 
> there was some bit for that, but I forgot its name. So we can simply not 
> set the bit for device mapper - and the applications will have to deal 
> with it by using write plugging.

I do not understand what you are talking about. A zoned DM device is zoned
because it is on top of a zoned device. That bottom zoned device may be another
DM target or a real zoned device. For the real zoned device, zone write plugging
is always used so it does not matter if the host controller does or does not
preserve command order. There will always be at most 1 in-flight write per zone,
which makes reordering of commands completely irrelevant for write commands 
success.

For DM, it is up to the target driver to determine if it is OK without zone
write plugging or if that will be needed, as the driver knows if it will
preserve (issue) writes in the same order it received them. E.g. dm-crypt does
not, so it sets the emulate zone append flag to use zone append emulation and
zone write plugging (note that these 2 aspects are aggregated into a single flag
because there was no need to control them separately for the existing DM targets
that support zones).

So I do not understand your point.

There are literally tens of millions of SMR drives running in production, a lot
of them using DM (e.g. dm-crypt). I would know if that was not working fine.

> 
>> What has not been covered are cases
>> like suspend/resume which may, depending on what they do, break the ordering
>> guarantees that we have for write requests. The only reason this has not been
>> fixed is because I completely overlooked these cases as zoned block devices 
>> were
>> in the past mostly used in enterprise systems where suspend/resume is not 
>> really
>> used at all. But we have zoned UFS devices these days (smart phones), so
>> properly supporting DM suspend/resume is important I think.
> 
> Do you mean zoned flash devices? I've never heard of them.

They exist and are gathering interest and use cases.


-- 
Damien Le Moal
Western Digital Research

Reply via email to