On 6/5/25 8:38 PM, John Garry wrote:
> The atomic write unit max is limited by any stack device stripe size.
> 
> It is required that the atomic write unit is a power-of-2 factor of the
> stripe size.
> 
> Currently we use io_min limit to hold the stripe size, and check for a
> io_min <= SECTOR_SIZE when deciding if we have a striped stacked device.
> 
> Nilay reports that this causes a problem when the physical block size is
> greater than SECTOR_SIZE [0].
> 
> Furthermore, io_min may be mutated when stacking devices, and this makes
> it a poor candidate to hold the stripe size. Such an example would be
> when the io_min is less than the physical block size.
> 
> Use chunk_sectors to hold the stripe size, which is more appropriate.
> 
> [0] 
> https://lore.kernel.org/linux-block/888f3b1d-7817-4007-b3b3-1a2ea04df...@linux.ibm.com/T/#mecca17129f72811137d3c2f1e477634e77f06781
> 
> Signed-off-by: John Garry <john.g.ga...@oracle.com>
> ---
>  block/blk-settings.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/block/blk-settings.c b/block/blk-settings.c
> index a000daafbfb4..5b0f1a854e81 100644
> --- a/block/blk-settings.c
> +++ b/block/blk-settings.c
> @@ -594,11 +594,13 @@ static bool 
> blk_stack_atomic_writes_boundary_head(struct queue_limits *t,
>  static bool blk_stack_atomic_writes_head(struct queue_limits *t,
>                               struct queue_limits *b)
>  {
> +     unsigned int chunk_size = t->chunk_sectors << SECTOR_SHIFT;
> +
>       if (b->atomic_write_hw_boundary &&
>           !blk_stack_atomic_writes_boundary_head(t, b))
>               return false;
>  
> -     if (t->io_min <= SECTOR_SIZE) {
> +     if (!t->chunk_sectors) {
>               /* No chunk sectors, so use bottom device values directly */
>               t->atomic_write_hw_unit_max = b->atomic_write_hw_unit_max;
>               t->atomic_write_hw_unit_min = b->atomic_write_hw_unit_min;
> @@ -617,12 +619,12 @@ static bool blk_stack_atomic_writes_head(struct 
> queue_limits *t,
>        * aligned with both limits, i.e. 8K in this example.
>        */
>       t->atomic_write_hw_unit_max = b->atomic_write_hw_unit_max;
> -     while (t->io_min % t->atomic_write_hw_unit_max)
> +     while (chunk_size % t->atomic_write_hw_unit_max)
>               t->atomic_write_hw_unit_max /= 2;
>  
>       t->atomic_write_hw_unit_min = min(b->atomic_write_hw_unit_min,
>                                         t->atomic_write_hw_unit_max);
> -     t->atomic_write_hw_max = min(b->atomic_write_hw_max, t->io_min);
> +     t->atomic_write_hw_max = min(b->atomic_write_hw_max, chunk_size);
>  
>       return true;
>  }

This works well with my NVMe disk which supports atomic writes however the only
concern is what if in case t->chunk_sectors is also defined for NVMe disk? 
I see that nvme_set_chunk_sectors() initializes the chunk_sectors for NVMe. 
The value which is assigned to lim->chunk_sectors in nvme_set_chunk_sectors()
represents "noiob" (i.e. Namespace Optimal I/O Boundary). My disk has "noiob" 
set to zero but in case if it's non-zero then would it break the above logic
for NVMe atomic writes?

Thanks,
--Nilay



Reply via email to