Hi,

在 2025/08/04 17:15, Xiao Ni 写道:
Hi Kuai

I found one thing. The interface ->blocks_synced doesn't work in the
write io path. So there is a risk of data corruption. For example:

mdadm -CR /dev/md0 -l5 -n5 /dev/loop[0-4] --bitmap=lockless (all bits
are unwritten because lazy initial recovery)
1. D1 D2 D3 D4 P1, a small write hits D2. rmw is 2 (need to read old
data of D2 and P1), rcw is 3 (need to read D1 D3 and D4).
2. D2 disk fails
3. read data from disk2. It needs to calculate the data from other
disks. But the result is not the real data which was written to D2

So ->blocks_synced needs to be checked in handle_stripe_dirtying.

This is correct, the write path is missing. Will fix it in the next
version.

Thanks,
Kuai


Best Regards
Xiao

On Fri, Aug 1, 2025 at 3:11 PM Yu Kuai <yuku...@huaweicloud.com> wrote:

From: Yu Kuai <yuku...@huawei.com>

Currently, raid456 must perform a whole array initial recovery to build
initail xor data, then IO to the array won't have to read all the blocks
in underlying disks.

This behavior will affect IO performance a lot, and nowadays there are
huge disks and the initial recovery can take a long time. Hence llbitmap
will support lazy initial recovery in following patches. This method is
used to check if data blocks is synced or not, if not then IO will still
have to read all blocks for raid456.

Signed-off-by: Yu Kuai <yuku...@huawei.com>
Reviewed-by: Christoph Hellwig <h...@lst.de>
Reviewed-by: Hannes Reinecke <h...@suse.de>
Reviewed-by: Xiao Ni <x...@redhat.com>
Reviewed-by: Li Nan <linan...@huawei.com>
---
  drivers/md/md-bitmap.h | 1 +
  drivers/md/raid5.c     | 6 ++++++
  2 files changed, 7 insertions(+)

diff --git a/drivers/md/md-bitmap.h b/drivers/md/md-bitmap.h
index 95453696c68e..5f41724cbcd8 100644
--- a/drivers/md/md-bitmap.h
+++ b/drivers/md/md-bitmap.h
@@ -90,6 +90,7 @@ struct bitmap_operations {
         md_bitmap_fn *end_discard;

         sector_t (*skip_sync_blocks)(struct mddev *mddev, sector_t offset);
+       bool (*blocks_synced)(struct mddev *mddev, sector_t offset);
         bool (*start_sync)(struct mddev *mddev, sector_t offset,
                            sector_t *blocks, bool degraded);
         void (*end_sync)(struct mddev *mddev, sector_t offset, sector_t 
*blocks);
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 5285e72341a2..2121f0ff5e30 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -3748,6 +3748,7 @@ static int want_replace(struct stripe_head *sh, int 
disk_idx)
  static int need_this_block(struct stripe_head *sh, struct stripe_head_state 
*s,
                            int disk_idx, int disks)
  {
+       struct mddev *mddev = sh->raid_conf->mddev;
         struct r5dev *dev = &sh->dev[disk_idx];
         struct r5dev *fdev[2] = { &sh->dev[s->failed_num[0]],
                                   &sh->dev[s->failed_num[1]] };
@@ -3762,6 +3763,11 @@ static int need_this_block(struct stripe_head *sh, 
struct stripe_head_state *s,
                  */
                 return 0;

+       /* The initial recover is not done, must read everything */
+       if (mddev->bitmap_ops && mddev->bitmap_ops->blocks_synced &&
+           !mddev->bitmap_ops->blocks_synced(mddev, sh->sector))
+               return 1;
+
         if (dev->toread ||
             (dev->towrite && !test_bit(R5_OVERWRITE, &dev->flags)))
                 /* We need this block to directly satisfy a request */
--
2.39.2



.



Reply via email to