On Mon, Dec 29, 2025 at 4:21 PM Pratyush Yadav <[email protected]> wrote: > > On Tue, Dec 23 2025, Pasha Tatashin wrote: > > > On Sat, Dec 6, 2025 at 6:03 PM Pratyush Yadav <[email protected]> wrote: > >> > >> HugeTLB manages its own pages. It allocates them on boot and uses those > >> to fulfill hugepage requests. > >> > >> To support live update for a hugetlb-backed memfd, it is necessary to > >> track how many pages of each hstate are coming from live update. This is > >> needed to ensure the boot time allocations don't over-allocate huge > >> pages, causing the rest of the system unexpected memory pressure. > >> > >> For example, say the system has 100G memory and it uses 90 1G huge > >> pages, with 10G put aside for other processes. Now say 5 of those pages > >> are preserved via KHO for live updating a huge memfd. > >> > >> But during boot, the system will still see that it needs 90 huge pages, > >> so it will attempt to allocate those. When the file is later retrieved, > >> those 5 pages also get added to the huge page pool, resulting in 95 > >> total huge pages. This exceeds the original expectation of 90 pages, and > >> ends up wasting memory. > >> > >> LUO has file-lifecycle-bound (FLB) data to keep track of global state of > >> a subsystem. Use it to track how many huge pages are used up for each > >> hstate. When a file is preserved, it will increment to the counter, and > >> when it is unpreserved, it will decrement it. During boot time > >> allocations, this data can be used to calculate how many hugepages > >> actually need to be allocated. > >> > >> Design note: another way of doing this would be to preserve the entire > >> set of hugepages using the FLB, skip boot time allocation, and restore > >> them all on FLB retrieve. The pain problem with that approach is that it > >> would need to freeze all hstates after serializing them. This will need > >> a lot more invasive changes in hugetlb since there are many ways folios > >> can be added to or removed from a hstate. Doing it this way is simpler > >> and less invasive. > >> > >> Signed-off-by: Pratyush Yadav <[email protected]> > >> --- > >> Documentation/mm/memfd_preservation.rst | 9 ++ > >> MAINTAINERS | 1 + > >> include/linux/kho/abi/hugetlb.h | 66 +++++++++ > >> kernel/liveupdate/Kconfig | 12 ++ > >> mm/Makefile | 1 + > >> mm/hugetlb.c | 1 + > >> mm/hugetlb_internal.h | 15 ++ > >> mm/hugetlb_luo.c | 179 ++++++++++++++++++++++++ > >> 8 files changed, 284 insertions(+) > >> create mode 100644 include/linux/kho/abi/hugetlb.h > >> create mode 100644 mm/hugetlb_luo.c > >> > [...] > >> +static int hugetlb_flb_retrieve(struct liveupdate_flb_op_args *args) > >> +{ > >> + /* > >> + * The FLB is only needed for boot-time calculation of how many > >> + * hugepages are needed. This is done by early boot handlers > >> already. > >> + * Free the serialized state now. > >> + */ > > > > It should be done in this function. > > The calculations can't be done in retrieve. Retrieve happens only once > and for the whole FLB. They will need to come from > hugetlb_hstate_alloc_pages(). > > Maybe you mean getting rid of liveupdate_flb_incoming_early()? Yeah, > that I can do. It will make this function a no-op once we move the > kho_restore_free() to finish().
Yeah, this is what I meant. Thanks, Pasha > > > > >> + kho_restore_free(phys_to_virt(args->data)); > > > > This should be moved to finish() after blackout. > > Sure. > > > > >> + > >> + /* > >> + * HACK: But since LUO FLB still needs an obj, use ZERO_SIZE_PTR to > >> + * satisfy it. > >> + */ > >> + args->obj = ZERO_SIZE_PTR; > > > > Hopefully this is not needed any more with the updated FLB, please check :-) > > Yep. IIRC when I sent this series the older version of FLB was in > mm-nonmm-unstable. > > > > >> + return 0; > >> +} > >> + > [...] > > -- > Regards, > Pratyush Yadav
