On Fri, Apr 17, 2026 at 11:50:58AM +0200, David Hildenbrand (Arm) wrote: > On 4/16/26 03:24, Gregory Price wrote: > > On Wed, Apr 15, 2026 at 12:47:50PM -0700, Frank van der Linden wrote: > >> > > 1GB ZONE_MOVABLE HugeTLBFS Pages is an example weird carve-out, because > > the memory is in ZONE_MOVABLE to help make 1GB allocations more > > reliable, but 1GB movable pages were removed from the kernel because > > they're not easily migrated (and therefore may block hot-unplug). > > > > (Thankfully they're back now, so VMs can live on this memory :P) > > Heh, but longterm-pinning would fail on them (making vfio with VMs > angry). Similar to CMA hugetlb. >
Yeah, depends how you configure things. As long as you expose those pages on a separate memfd and online it in ZONE_MOVABLE in the guest to avoid vfio from touching it - you can have your cake and eat it too. It's a bit of bodge but it works. However... > In the latter case, we should have a way to identify "this allocation is > actually from the CMA owner, so longterm pinning is perfectly fine". > Checking the CMA alloc state would be one approach, but that's rather > nasty. I guess there would be ways to make that work. > > I'd assume that people barely rely on 1GB ZONE_MOVABLE HugeTLBFS Pages > (iow, mixing kernel-cmdline ZONE_MOVABLE creation with kernel-cmdline > hugetlb reservation). > > I'll note that there was long long ago a proposal of converting > ZONE_MOVABLE to "sticky-movable" page blocks. It wouldn't really solve > this problem, though, where the early boot code just does something > that's rather stupid. > I have been toying with hotpluggable CMA regions. Interesting opportunity: Hotplug on a private node w/ (RECLAIM | DEMOTION | CMA | HUGETLBFS) Now you have exactly two enabled consumers: 1) HugeTLBFS 2) vmscan.c demotion logic In this regard, HugeTLBFS is the only one that can reach these pages in a way that could result in the pages being pinned. All other pages on the node are - by definition - movable, because they can only reach the node via migration (demotion). The system can't do fallback allocations to the node, so it operates a bit slower as a general purpose memory pool - but if you decide you want to optimize for that you can unplug/hotplug the memory back to a normal node in ZONE_MOVABLE - without rebooting. ~Gregory
