On Thu, 5 Feb 2026 17:48:47 +0000
Jonathan Cameron <[email protected]> wrote:

Hi Jonathan,

Thanks for the clarifications.

Quick thought inline.

> > > I'm not clear if sysram could be used for virtio, or even needed.  I'm
> > > still figuring out how virtio of simple memory devices is a gain.
> > >     
> > 
> > Jonathan mentioned that he thinks it would be possible to just bring it
> > online as a private-node and inform the consumer of this.  I think
> > that's probably reasonable.  
> 
> Firstly VM == Application.  If we have say a DB that wants to do everything
> itself, it would use same interface as a VM to get the whole memory
> on offer. (I'm still trying to get that Application Specific Memory term
> adopted ;) 
> 
> This would be better if we didn't assume anything to do with virtio
> - that's just one option (and right now for CXL mem probably not the
> sensible one as it's missing too many things we get for free by just
> emulating CXL devices - e.g. all the stuff you are describing here
> for the host is just as valid in the guest.) We have a path to
> get that emulation and should have the big missing piece posted shortly
> (DCD backed by 'things - this discussion' that turn up after VM boot).
> 
> The real topic is memory for a VM and we need a way to tie a memory
> backend in qemu to, so that whatever the fabric manager provided for
> that VM is given to the VM and not used for anything else.
> 
> If it's for a specific VM, then it's tagged as otherwise how else
> do we know the intent? (lets ignore random other out of band paths).
> 
> Layering wise we can surface as many backing sources as we like at
> runtime via 1+ emulated DCD devices (to give perf information etc).
> They each show up in the guest as contiguous (maybe tagged) single
> extent and then we apply what ever comes out of the rest of this
> discussion on top of that.
> 
> So all we care about is how the host presents it.
> 
> Bunch of things might work for this.
> 
> 1. Just put it in a numa node that requires specific selection to allocate
>    from.  This is nice because it just looks like normal memory and we
>    can apply any type of front end on top of that.  Not good if we have a lot
>    of these coming and going.
> 
> 2. Provide it as something with an fd we can memmap. I was fine with Dax for
>    this but if it's normal ram just for a VM anything that gives me a handle
>    that I can memmap is fine. Just need a way to know which one (so tag).

I think both of these approaches are OK, but looking from developers
perspective, if someone wants a specific memory for their workload, they
should rather get a fd and play with it in whichever way they want. NUMA may
not give that much flexibility. As a developer it would prefer 2. Though you
may say oh dax then? not sure!
> 
> It's pretty similar for shared cases. Just need a handle to memmap.
> In that case, tag goes straight up to guest OS (we've just unwound the
> extent ordering in the host and presented it as a contiguous single
> extent).
> 
> Assumption here is we always provide all that capacity that was tagged
> for the VM to use to the VM.   Things may get more entertaining if we have
> a bunch of capacity that was tagged to provide extra space for a set of
> VMs (e.g. we overcommit on top of the DCD extents) - to me that's a
> job for another day.
> 
> So I'm not really envisioning anything special for the VM case, it's
> just a dedicate allocation of memory for a user who knows how to get it.
> We will want a way to get perf info though so we can provide that
> in the VM.  Maybe can figure that out from the CXL HW backing it without
> needing anything special in what is being discussed here.
> 
> Jonathan
> 
> > 
> > ~Gregory  
> 

Reply via email to