On Mon, Dec 08, 2025 at 06:07:14PM +0800, Yongpeng Yang wrote:
> From: Yongpeng Yang <[email protected]>
>
> Currently, the max_hw_discard_sectors of a stripe target is set to the
> minimum max_hw_discard_sectors among all sub devices. When the discard
> bio is larger than max_hw_discard_sectors, this may cause the stripe
> device to split discard bios unnecessarily, because the value of
> max_hw_discard_sectors affects max_discard_sectors, which equal to
> min(max_hw_discard_sectors, max_user_discard_sectors).
>
> For example:
> root@vm:~# echo '0 33554432 striped 2 256 /dev/vdd 0 /dev/vde 0' | dmsetup
> create stripe_dev
> root@vm:~# cat /sys/block/dm-1/queue/discard_max_bytes
> 536870912
> root@vm:~# cat /sys/block/dm-1/slaves/vdd/queue/discard_max_bytes
> 536870912
> root@vm:~# blkdiscard -o 0 -l 1073741824 -p 1073741824 /dev/mapper/stripe_dev
>
> dm-1 is the stripe device, and its discard_max_bytes is equal to
> each sub device’s discard_max_bytes. Since the requested discard
> length exceeds discard_max_bytes, the block layer splits the discard bio:
>
> block_bio_queue: 252,1 DS 0 + 2097152 [blkdiscard]
> block_split: 252,1 DS 0 / 1048576 [blkdiscard]
> block_rq_issue: 253,48 DS 268435456 () 0 + 524288 be,0,4 [blkdiscard]
> block_bio_queue: 253,64 DS 524288 + 524288 [blkdiscard]
>
> However, both vdd and vde can actually handle a discard bio of 536870912
> bytes, so this split is not necessary.
>
> This patch updates the stripe target’s q->limits.max_hw_discard_sectors
> to be the minimum max_hw_discard_sectors of the sub devices multiplied
> by the # of stripe devices. This enables the stripe device to handle
> larger discard bios without incurring unnecessary splitting.
>
> Signed-off-by: Yongpeng Yang <[email protected]>
> ---
> drivers/md/dm-stripe.c | 16 +++++++++++++++-
> 1 file changed, 15 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/md/dm-stripe.c b/drivers/md/dm-stripe.c
> index 1461dc740dae..799d6def699b 100644
> --- a/drivers/md/dm-stripe.c
> +++ b/drivers/md/dm-stripe.c
> @@ -38,6 +38,9 @@ struct stripe_c {
> uint32_t chunk_size;
> int chunk_size_shift;
>
> + /* The minimum max_hw_discard_sectors of all sub devices. */
> + unsigned int max_hw_discard_sectors;
> +
> /* Needed for handling events */
> struct dm_target *ti;
>
> @@ -169,6 +172,8 @@ static int stripe_ctr(struct dm_target *ti, unsigned int
> argc, char **argv)
> * Get the stripe destinations.
> */
> for (i = 0; i < stripes; i++) {
> + struct request_queue *q;
> +
> argv += 2;
>
> r = get_stripe(ti, sc, i, argv);
> @@ -180,6 +185,13 @@ static int stripe_ctr(struct dm_target *ti, unsigned int
> argc, char **argv)
> return r;
> }
> atomic_set(&(sc->stripe[i].error_count), 0);
> +
> + q = bdev_get_queue(sc->stripe[i].dev->bdev);
> + if (i == 0)
> + sc->max_hw_discard_sectors =
> q->limits.max_hw_discard_sectors;
> + else
> + sc->max_hw_discard_sectors =
> min_not_zero(sc->max_hw_discard_sectors,
> + q->limits.max_hw_discard_sectors);
I don't think any of the above is necessary. When stripe_io_hints() is
called, dm_set_device_limits() will already have been called on all the
underlying stripe devices to combine the limits. So
limits->max_hw_discard_sectors should already be set to the same value
as you're computing for sc->max_hw_discard_sectors. Right?
> }
>
> ti->private = sc;
> @@ -456,7 +468,7 @@ static void stripe_io_hints(struct dm_target *ti,
> struct queue_limits *limits)
> {
> struct stripe_c *sc = ti->private;
> - unsigned int io_min, io_opt;
> + unsigned int io_min, io_opt, max_hw_discard_sectors;
>
> limits->chunk_sectors = sc->chunk_size;
>
> @@ -465,6 +477,8 @@ static void stripe_io_hints(struct dm_target *ti,
> limits->io_min = io_min;
> limits->io_opt = io_opt;
> }
> + if (!check_mul_overflow(sc->max_hw_discard_sectors, sc->stripes,
> &max_hw_discard_sectors))
sc->max_hw_discard_sectors should be the same as
limits->max_hw_discard_sectors here
> + limits->max_hw_discard_sectors = max_hw_discard_sectors;
I see a couple of issues with this calculation. First, this only works
if max_hw_discard_sectors is greater than sc->chunk_size. But more than
that, before you multiply the original limit by the number of stripes,
you must round it down to a multiple of chunk_size. Otherwise, you can
end up writing too much to some of the underlying devices. To show this
with simple numbers, imagine you had 3 stripes that with a chunk_size of
4 and all the underlying devices had a max_hw_discard_sectors of 5. You
would set max_hw_discard_sectors to 5 * 3 = 15. But if you discared 15
sectors from the beginning of the device, you would end up discarding
the first 4 sectors of each underlying device, and then loop around
discard 3 more sectors of the first device. This means that you would
discard 7 from the first device, instead of the 5 that it could handle.
Rounding max_hw_discard_sectors down to a multiple of chunk_size will
fix that.
Lastly, if you do overflow when multiplying max_hw_discard_sectors by
the number of stripes, you should probably just set
limits->max_hw_discard_sectors to UINT_MAX >> SECTOR_SHIFT, instead of
leaving it at what it was.
-Ben
> }
>
> static struct target_type stripe_target = {
> --
> 2.43.0