On Wed, 25 Jun 2025, Damien Le Moal wrote:
> Any zoned DM target that requires zone append emulation will use the
> block layer zone write plugging. In such case, DM target drivers must
> not split BIOs using dm_accept_partial_bio() as doing so can potentially
> lead to deadlocks with queue freeze operations. Regular write operations
> used to emulate zone append operations also cannot be split by the
> target driver as that would result in an invalid writen sector value
> return using the BIO sector.
>
> In order for zoned DM target drivers to avoid such incorrect BIO
> splitting, we must ensure that large BIOs are split before being passed
> to the map() function of the target, thus guaranteeing that the
> limits for the mapped device are not exceeded.
>
> dm-crypt and dm-flakey are the only target drivers supporting zoned
> devices and using dm_accept_partial_bio().
>
> In the case of dm-crypt, this function is used to split BIOs to the
> internal max_write_size limit (which will be suppressed in a different
> patch). However, since crypt_alloc_buffer() uses a bioset allowing only
> up to BIO_MAX_VECS (256) vectors in a BIO. The dm-crypt device
> max_segments limit, which is not set and so default to BLK_MAX_SEGMENTS
> (128), must thus be respected and write BIOs split accordingly.
>
> In the case of dm-flakey, since zone append emulation is not required,
> the block layer zone write plugging is not used and no splitting of BIOs
> required.
>
> Modify the function dm_zone_bio_needs_split() to use the block layer
> helper function bio_needs_zone_write_plugging() to force a call to
> bio_split_to_limits() in dm_split_and_process_bio(). This allows DM
> target drivers to avoid using dm_accept_partial_bio() for write
> operations on zoned DM devices.
>
> Fixes: f211268ed1f9 ("dm: Use the block layer zone append emulation")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Damien Le Moal <dlem...@kernel.org>
Reviewed-by: Mikulas Patocka <mpato...@redhat.com>
> ---
> drivers/md/dm.c | 29 ++++++++++++++++++++++-------
> 1 file changed, 22 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index e477765cdd27..f1e63c1808b4 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -1773,12 +1773,29 @@ static inline bool dm_zone_bio_needs_split(struct
> mapped_device *md,
> struct bio *bio)
> {
> /*
> - * For mapped device that need zone append emulation, we must
> - * split any large BIO that straddles zone boundaries.
> + * Special case the zone operations that cannot or should not be split.
> */
> - return dm_emulate_zone_append(md) && bio_straddles_zones(bio) &&
> - !bio_flagged(bio, BIO_ZONE_WRITE_PLUGGING);
> + switch (bio_op(bio)) {
> + case REQ_OP_ZONE_APPEND:
> + case REQ_OP_ZONE_FINISH:
> + case REQ_OP_ZONE_RESET:
> + case REQ_OP_ZONE_RESET_ALL:
> + return false;
> + default:
> + break;
> + }
> +
> + /*
> + * Mapped devices that require zone append emulation will use the block
> + * layer zone write plugging. In such case, we must split any large BIO
> + * to the mapped device limits to avoid potential deadlocks with queue
> + * freeze operations.
> + */
> + if (!dm_emulate_zone_append(md))
> + return false;
> + return bio_needs_zone_write_plugging(bio) || bio_straddles_zones(bio);
> }
> +
> static inline bool dm_zone_plug_bio(struct mapped_device *md, struct bio
> *bio)
> {
> if (!bio_needs_zone_write_plugging(bio))
> @@ -1927,9 +1944,7 @@ static void dm_split_and_process_bio(struct
> mapped_device *md,
>
> is_abnormal = is_abnormal_io(bio);
> if (static_branch_unlikely(&zoned_enabled)) {
> - /* Special case REQ_OP_ZONE_RESET_ALL as it cannot be split. */
> - need_split = (bio_op(bio) != REQ_OP_ZONE_RESET_ALL) &&
> - (is_abnormal || dm_zone_bio_needs_split(md, bio));
> + need_split = is_abnormal || dm_zone_bio_needs_split(md, bio);
> } else {
> need_split = is_abnormal;
> }
> --
> 2.49.0
>