On Thu, Oct 16, 2025 at 12:59:51PM -0700, Andrew Morton wrote:
> On Tue, 14 Oct 2025 16:47:31 +0200 "Uladzislau Rezki (Sony)" 
> <[email protected]> wrote:
> 
> > When performing a read-modify-write(RMW) operation, any modification
> > to a buffered block must cause the entire buffer to be marked dirty.
> > 
> > Marking only a subrange as dirty is incorrect because the underlying
> > device block size(ubs) defines the minimum read/write granularity. A
> > lower device can perform I/O only on regions which are fully aligned
> > and sized to ubs.
> > 
> > This change ensures that write-back operations always occur in full
> > ubs-sized chunks, matching the intended emulation semantics of the
> > EBS target.
> 
> It sounds like this can result in corruption under some circumstances?
> 
> It would be helpful if you could spell this out clearly, please.  What
> are the userspace-visible effects of this bug and how are those effects
> demonstrated?

See below:

<snip>
commit 333b5e9ff2ccb35c3040fa8b0fd7011dfd42aae2
Author: Uladzislau Rezki (Sony) <[email protected]>
Date:   Wed Oct 8 19:49:50 2025 +0200

    dm-ebs: Mark full buffer dirty even on partial write
    
    When performing a read-modify-write(RMW) operation, any modification
    to a buffered block must cause the entire buffer to be marked dirty.
    
    Marking only a subrange as dirty is incorrect because the underlying
    device block size(ubs) defines the minimum read/write granularity. A
    lower device can perform I/O only on regions which are fully aligned
    and sized to ubs.
    
    This change ensures that write-back operations always occur in full
    ubs-sized chunks, matching the intended emulation semantics of the
    EBS target.
    
    As for user space visible impact, submitting sub-ubs and misaligned
    I/O for devices which are tuned to ubs sizes only, will reject such
    requests, therefore it can lead to losing data. Example:
    
    1) Create a 8K nvme device in qemu by adding
    
    -device 
nvme,drive=drv0,serial=foo,logical_block_size=8192,physical_block_size=8192
    
    2) Setup dm-ebs to emulate 512B to 8K mapping.
    
    urezki@pc638:~/bin$ cat dmsetup.sh
    
    lower=/dev/nvme0n1
    len=$(blockdev --getsz "$lower")
    
    echo "0 $len ebs $lower 0 1 16" | dmsetup create nvme-8k
    urezki@pc638:~/bin$
    
    offset 0, ebs=1 and ubs=16(in sectors).
    
    3) Create an ext4 filesystem(default 4K block size)
    
    urezki@pc638:~/bin$ sudo mkfs.ext4 -F /dev/dm-0
    mke2fs 1.47.0 (5-Feb-2023)
    Discarding device blocks: done
    Creating filesystem with 2072576 4k blocks and 518144 inodes
    Filesystem UUID: bd0b6ca6-0506-4e31-86da-8d22c9d50b63
    Superblock backups stored on blocks:
            32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632
    
    Allocating group tables: done
    Writing inode tables: done
    Creating journal (16384 blocks): done
    Writing superblocks and filesystem accounting information: mkfs.ext4: 
Input/output error while writing out and closing file system
    urezki@pc638:~/bin$ dmesg
    
    <snip>
    [ 1618.875449] buffer_io_error: 1028 callbacks suppressed
    [ 1618.875456] Buffer I/O error on dev dm-0, logical block 0, lost async 
page write
    [ 1618.875527] Buffer I/O error on dev dm-0, logical block 1, lost async 
page write
    [ 1618.875602] Buffer I/O error on dev dm-0, logical block 2, lost async 
page write
    [ 1618.875620] Buffer I/O error on dev dm-0, logical block 3, lost async 
page write
    [ 1618.875639] Buffer I/O error on dev dm-0, logical block 4, lost async 
page write
    [ 1618.894316] Buffer I/O error on dev dm-0, logical block 5, lost async 
page write
    [ 1618.894358] Buffer I/O error on dev dm-0, logical block 6, lost async 
page write
    [ 1618.894380] Buffer I/O error on dev dm-0, logical block 7, lost async 
page write
    [ 1618.894405] Buffer I/O error on dev dm-0, logical block 8, lost async 
page write
    [ 1618.894427] Buffer I/O error on dev dm-0, logical block 9, lost async 
page write
    <snip>
    
    Many I/O errors because the lower 8K device rejects sub-ubs/misaligned
    requests.
    
    with a patch:
    
    urezki@pc638:~/bin$ sudo mkfs.ext4 -F /dev/dm-0
    mke2fs 1.47.0 (5-Feb-2023)
    Discarding device blocks: done
    Creating filesystem with 2072576 4k blocks and 518144 inodes
    Filesystem UUID: 9b54f44f-ef55-4bd4-9e40-c8b775a616ac
    Superblock backups stored on blocks:
            32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632
    
    Allocating group tables: done
    Writing inode tables: done
    Creating journal (16384 blocks): done
    Writing superblocks and filesystem accounting information: done
    
    urezki@pc638:~/bin$ sudo mount /dev/dm-0 /mnt/
    urezki@pc638:~/bin$ ls -al /mnt/
    total 24
    drwxr-xr-x  3 root root  4096 Oct 17 15:13 .
    drwxr-xr-x 19 root root  4096 Jul 10 19:42 ..
    drwx------  2 root root 16384 Oct 17 15:13 lost+found
    urezki@pc638:~/bin$
    
    After this change: mkfs completes; mount succeeds.
    
    Signed-off-by: Uladzislau Rezki (Sony) <[email protected]>

diff --git a/drivers/md/dm-ebs-target.c b/drivers/md/dm-ebs-target.c
index 6abb31ca9662..b354e74a670e 100644
--- a/drivers/md/dm-ebs-target.c
+++ b/drivers/md/dm-ebs-target.c
@@ -103,7 +103,7 @@ static int __ebs_rw_bvec(struct ebs_c *ec, enum req_op op, 
struct bio_vec *bv,
                        } else {
                                flush_dcache_page(bv->bv_page);
                                memcpy(ba, pa, cur_len);
-                               dm_bufio_mark_partial_buffer_dirty(b, buf_off, 
buf_off + cur_len);
+                               dm_bufio_mark_buffer_dirty(b);
                        }
 
                        dm_bufio_release(b);
<snip>

--
Uladzislau Rezki

Reply via email to