> On 22 Feb 2026, at 09:48, Gregory Price <[email protected]> wrote:
> 
> Topic type: MM
> 
> Presenter: Gregory Price <[email protected]>
> 
> This series introduces N_MEMORY_PRIVATE, a NUMA node state for memory
> managed by the buddy allocator but excluded from normal allocations.
> 
> I present it with an end-to-end Compressed RAM service (mm/cram.c)
> that would otherwise not be possible (or would be considerably more
> difficult, be device-specific, and add to the ZONE_DEVICE boondoggle).
> 
> 
> TL;DR
> ===
> 
> N_MEMORY_PRIVATE is all about isolating NUMA nodes and then punching
> explicit holes in that isolation to do useful things we couldn't do
> before without re-implementing entire portions of mm/ in a driver.
> 
> 
> /* This is my memory. There are many like it, but this one is mine. */
> rc = add_private_memory_driver_managed(nid, start, size, name, flags,
>                                       online_type, private_context);
> 
> page = alloc_pages_node(nid, __GFP_PRIVATE, 0);
> 
> /* Ok but I want to do something useful with it */
> static const struct node_private_ops ops = {
>        .migrate_to     = my_migrate_to,
>        .folio_migrate  = my_folio_migrate,
>        .flags = NP_OPS_MIGRATION | NP_OPS_MEMPOLICY,
> };
> node_private_set_ops(nid, &ops);
> 
> /* And now I can use mempolicy with my memory */
> buf = mmap(...);
> mbind(buf, len, mode, private_node, ...);
> buf[0] = 0xdeadbeef;  /* Faults onto private node */
> 
> /* And to be clear, no one else gets my memory */
> buf2 = malloc(4096);  /* Standard allocation */
> buf2[0] = 0xdeadbeef; /* Can never land on private node */
> 
> /* But i can choose to migrate it to the private node */
> move_pages(0, 1, &buf, &private_node, NULL, ...);
> 
> /* And more fun things like this */
> 
> 
> Patchwork
> ===
> A fully working branch based on cxl/next can be found here:
> https://github.com/gourryinverse/linux/tree/private_compression
> 
> A QEMU device which can inject high/low interrupts can be found here:
> https://github.com/gourryinverse/qemu/tree/compressed_cxl_clean
> 
> The additional patches on these branches are CXL and DAX driver
> housecleaning only tangentially relevant to this RFC, so i've
> omitted them for the sake of trying to keep it somewhat clean
> here.  Those patches should (hopefully) be going upstream anyway.
> 
> Patches 1-22: Core Private Node Infrastructure
> 
>  Patch  1:      Introduce N_MEMORY_PRIVATE scaffolding
>  Patch  2:      Introduce __GFP_PRIVATE
>  Patch  3:      Apply allocation isolation mechanisms
>  Patch  4:      Add N_MEMORY nodes to private fallback lists
>  Patches 5-9:   Filter operations not yet supported
>  Patch 10:      free_folio callback
>  Patch 11:      split_folio callback
>  Patches 12-20: mm/ service opt-ins:
>                   Migration, Mempolicy, Demotion, Write Protect,
>                   Reclaim, OOM, NUMA Balancing, Compaction,
>                   LongTerm Pinning
>  Patch 21:      memory_failure callback
>  Patch 22:      Memory hotplug plumbing for private nodes
> 
> Patch 23: mm/cram -- Compressed RAM Management
> 
> Patches 24-27: CXL Driver examples
>  Sysram Regions with Private node support
>  Basic Driver Example: (MIGRATION | MEMPOLICY)
>  Compression Driver Example (Generic)
> 
Hi,

As I think this is about to be discussed in the conference, I thought
to share some high level comments.

I have tested this for some time on a device with compression (after some
necessary fixes for CXL RCD to work, that Greg helped me with).

Overall, the isolation property that this provides is something I deem necessary
for this technology. Others are better placed to judge the MM plumbing
itself, but I wanted to say that this functionality is an important piece of 
the puzzle
from the device/use-case side.

For cram itself, as it is in this RFC, I think there is still performance and
value left on the table (as noted in the description), but I fully understand 
Gregory’s 
premise in approaching it this way.

<snip>
> 
> Future CRAM : Loosening the read-only constraint
> ===
> 
> The read-only model is safe but conservative.  For workloads where
> compressed pages are occasionally written, the promotion fault adds
> latency.  A future optimization could allow a tunable fraction of
> compressed pages to be mapped writable, accepting some risk of
> write-driven decompression in exchange for lower overhead.
> 
> The private node ops make this straightforward:
> 
>  - Adjust fixup_migration_pte to selectively skip
>    write-protection.
>  - Use the backpressure system to either revoke writable mappings,
>    deny additional demotions, or evict when device pressure rises.
I have some quick hacks playing with these ideas but I haven’t had the time
to test it thoroughly and get to something robust yet. I saw in another thread
that there is a follow up cooking which looks interesting.

Thanks Greg for pushing this, and I’m happy to test more on HW in our lab.

Best,
/Yiannis




Reply via email to