On Mon, Jan 22, 2024 at 05:30:07PM +0100, Mikulas Patocka wrote:
> Hi
>
>
> On Fri, 19 Jan 2024, Ming Lei wrote:
>
> > Hi Mikulas,
> >
> > On Thu, Aug 10, 2023 at 12:07:07PM +0200, Mikulas Patocka wrote:
> > > Hi
> > >
> > > Here I'm submitting the ramdisk discard patches for the next merge
> > > window.
> > > If you want to make some more changes, please let me now.
> >
> > brd discard is removed in f09a06a193d9 ("brd: remove discard support")
> > in 2017 because it is just driver private write_zero, and user can get same
> > result with fallocate(FALLOC_FL_ZERO_RANGE).
> >
> > Also you only mentioned the motivation in V1 cover-letter:
> >
> > https://lore.kernel.org/linux-block/alpine.lrh.2.02.2209151604410.13...@file01.intranet.prod.int.rdu2.redhat.com/
> >
> > ```
> > Zdenek asked me to write it, because we use brd in the lvm2 testsuite and
> > it would be benefical to run the testsuite with discard enabled in order
> > to test discard handling.
> > ```
> >
> > But we have lots of test disks with discard support: loop, scsi_debug,
> > null_blk, ublk, ..., so one requestion is that why brd discard is
> > a must for lvm2 testsuite to cover (lvm)discard handling?
>
> We should ask Zdeněk Kabeláč about it - he is expert about the lvm2
> testsuite.
>
> > The reason why brd didn't support discard by freeing pages is writeback
> > deadlock risk, see:
> >
> > commit f09a06a193d9 ("brd: remove discard support")
> >
> > -static void discard_from_brd(struct brd_device *brd,
> > - sector_t sector, size_t n)
> > -{
> > - while (n >= PAGE_SIZE) {
> > - /*
> > - * Don't want to actually discard pages here because
> > - * re-allocating the pages can result in writeback
> > - * deadlocks under heavy load.
> > - */
> > - if (0)
> > - brd_free_page(brd, sector);
> > - else
> > - brd_zero_page(brd, sector);
> > - sector += PAGE_SIZE >> SECTOR_SHIFT;
> > - n -= PAGE_SIZE;
> > - }
> > -}
> >
> > However, you didn't mention how your patches address this potential
> > risk, care to document it? I can't find any related words about
> > this problem.
>
> The writeback deadlock can happen even without discard - if the machine
> runs out of memory while writing data to a ramdisk. But the probability is
> increased when discard is used, because pages are freed and re-allocated
> more often.
Yeah, I agree, what I meant is that this thing needs to be documented,
given discard is re-introduced, and the original deadlock comment isn't
addressed
>
> Generally, the admin should make sure that the machine has enough
> available memory when creating a ramdisk - then, the deadlock can't
> happen.
>
> Ramdisk has no limit on the number of allocated pages, so when it runs out
> of memory, the oom killer will try to kill unrelated processes and the
> machine will hang. If there is risk of overflowing the available memory,
> the admin should use tmpfs instead of a ramdisk - tmpfs can be configured
> with a limit and it can also swap out pages.
>
> > BTW, your patches looks more complicated than the original removed
> > discard implementation. And if the above questions get addressed,
> > I am happy to provide review on the following patches.
>
> My patches actually free the discarded pages. The original discard
> implementation just overwrote the pages with zeroes without freeing them.
The original implementation supports to discard by freeing pages, and
it is just bypassed unconditionally by:
if (0)
brd_free_page(brd, sector);
else
brd_zero_page(brd, sector);
However, page could be freed by discard when it is being consumed in
brd_do_bvec().
Maybe your patch of "brd: extend the rcu regions to cover read and write"
can be simplified a bit, such as:
- grab rcu read lock in brd_do_bvec()
- release the rcu read lock when allocating page via alloc_page() in
brd_insert_page()
- change free page by rcu
Or avoid it by holding page reference:
- grabbing page reference in brd_lookup_page() if it is called from
copy_to_brd() or copy_from_brd(), and drop it after it is consumed
Thanks,
Ming