On 7/21/25 09:12, Yu Kuai wrote:
Hi,
在 2025/07/21 14:20, Hannes Reinecke 写道:
On 7/18/25 11:23, Yu Kuai wrote:
From: Yu Kuai <yuku...@huawei.com>
+
+#define BITMAP_DATA_OFFSET 1024
+
+/* 64k is the max IO size of sync IO for raid1/raid10 */
+#define MIN_CHUNK_SIZE (64 * 2)
+
Hmm? Which one is it?
Comment says 'max IO size', but it's called 'MIN_CHUNK_SIZE'...
max IO size here means the internal recovery IO size for raid1/10,
and we're handling at most one llbimtap bit(chunksie) at a time, so
chunksize should be at least 64k, otherwise recovery IO size will be
less.
Okay.
+/*
+ * Dirtied bits that have not been accessed for more than 5s will be
cleared
+ * by daemon.
+ */
+#define BARRIER_IDLE 5
+
Should this be changeable, too?
Yes, idealy this should. Perhaps a new sysfs api?
Yes, similarly to the daemon_delay one.
+ if (!test_bit(LLPageDirty, &pctl->flags))
+ set_bit(LLPageDirty, &pctl->flags);
+
+ /*
+ * The subpage usually contains a total of 512 bits. If any
single bit
+ * within the subpage is marked as dirty, the entire sector will be
+ * written. To avoid impacting write performance, when multiple
bits
+ * within the same sector are modified within a short time
frame, all
+ * bits in the sector will be collectively marked as dirty at once.
+ */
How short is the 'short timeframe'?
Is this the BARRIER_IDLE setting?
Please clarify.
Yes, if the page is not accessed for BARRIER_IDLE seconds.
Please update the comment to refer to that.
+static struct page *llbitmap_read_page(struct llbitmap *llbitmap,
int idx)
+{
+ struct mddev *mddev = llbitmap->mddev;
+ struct page *page = NULL;
+ struct md_rdev *rdev;
+
+ if (llbitmap->pctl && llbitmap->pctl[idx])
+ page = llbitmap->pctl[idx]->page;
+ if (page)
+ return page;
+
+ page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+ if (!page)
+ return ERR_PTR(-ENOMEM);
+
+ rdev_for_each(rdev, mddev) {
+ sector_t sector;
+
+ if (rdev->raid_disk < 0 || test_bit(Faulty, &rdev->flags))
+ continue;
+
+ sector = mddev->bitmap_info.offset +
+ (idx << PAGE_SECTORS_SHIFT);
+
+ if (sync_page_io(rdev, sector, PAGE_SIZE, page, REQ_OP_READ,
+ true))
+ return page;
+
+ md_error(mddev, rdev);
+ }
+
+ __free_page(page);
+ return ERR_PTR(-EIO);
+}
+
Have you considered moving to folios here?
Of course, however, because the md high level helpers is still using
page, I'm thinking about using page for now, which is simpler, and
moving to folios for all md code later.
Fair enough.
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
h...@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich