On 4/9/25 8:51 AM, Benjamin Marzinski wrote: Thanks for all the fixes. Of note is that I am looking into the dm-delay pre-suspend problem for zoned case (potential reordering of writes). It turns out to not be an easy fix given how dm-delay works. Still scratching my head :)
> There were multiple places in dm's __bind() function where it could fail > and not completely roll back, leaving the device using the the old > table, but with device limits and resources from the new table. > Additionally, unused mempools for request-based devices were not always > freed immediately. > > Also, there were a number of issues with switching zoned tables that > emulate zone append (in other words, dm-crypt on top of zoned devices). > dm_blk_report_zones() could be called while the device was initially > setting up a zoned table and creating zoned resources or could possibly > fail to end a srcu read section. More importantly, > blk_revalidate_disk_zones() would never get called when updating a zoned > table. This could cause the dm device to see the wrong zone write > offsets, not have a large enough zwplugs reserved in its mempool, or > read invalid memory when checking the conventional zones bitmap. > > Finally, any DM device created on top of a device emulating zone appeads > will automatically have zone write plug resources created for it, since > max_hw_zone_append_sectors will always be 0 for a device stacked on top > of a device max_hw_zone_append_sectors = 0 > > This patchset fixes these issues. It deals with the problems around > blk_revalidate_disk_zones() by only calling it for devices that have no > zone write plug resources. This will always correctly update the zoned > settings. If a device has zone write plug resources, calling > blk_revalidate_disk_zones() will not correctly update them in many > cases, so DM simply doesn't call it for devices with zone write plug > resources. Instead of allowing people to load tables that can break the > device, like currently happens, DM disallosw any table reloads that > change the zoned setting for devices that already have zone write plug > resources. Finally, it deals with the max_hw_zone_append_sectors issue > by making sure that it is non-zero for zoned DM devices that do not need > zone write append emulation. > > Changes in V3: > - Use queue_limits_start_update() instead of modifying > queue_limits_set() > - Rewrote the commit message for patch 0004 ("dm: fix > dm_blk_report_zones") to explain that this only happens when initially > setting up a table with zone append resources, so disallowing table > swaps after you set up a zoned dm-crypt device will not effect the > issue at all. I did not implement Christoph's suggestion because I > don't understand how it would work here. Perhaps I'm just being dense. > I'm not wedded to this solution. Any one that keeps this > use-after-free error from being possible is fine by me. > - Added a final patch to deal with the issue of stacked devices always > getting zone append resources if any underlying device needs them. > > Changes in V2: > - Made queue_limits_set() optionally return the old limits (grabbed > while holding the limits_lock), and used this in > dm_table_set_restrictions() > - dropped changes to disk_free_zone_resources() and the > blk_revalidate_disk_zones() code path (removed patches 0005 & 0006) > - Instead of always calling blk_revalidate_disk_zones(), disallow > changes that would change zone settings if the device has > zone write plug resources (final patch). > > Benjamin Marzinski (6): > dm: don't change md if dm_table_set_restrictions() fails > dm: free table mempools if not used in __bind > dm: handle failures in dm_table_set_restrictions > dm: fix dm_blk_report_zones > dm: limit swapping tables for devices with zone write plugs > dm: fix native zone append devices on top of emulated ones > > drivers/md/dm-core.h | 1 + > drivers/md/dm-table.c | 67 +++++++++++++++++++++++++------- > drivers/md/dm-zone.c | 90 ++++++++++++++++++++++++++++++------------- > drivers/md/dm.c | 36 ++++++++++------- > drivers/md/dm.h | 6 +++ > 5 files changed, 146 insertions(+), 54 deletions(-) > -- Damien Le Moal Western Digital Research