Gregory Price wrote:
> On Sun, Apr 13, 2025 at 05:52:08PM -0500, Ira Weiny wrote:
> > A git tree of this series can be found here:
> > 
> >     https://github.com/weiny2/linux-kernel/tree/dcd-v6-2025-04-13
> > 
> > This is now based on 6.15-rc2.
> > 
> 
> Extreme necro-bump for this set, but i wonder what folks opinion is on
> DCD support if we expose a new region control pattern ala:
> 
> https://lore.kernel.org/linux-cxl/[email protected]/
> 
> The major difference would be elimination of sparse-DAX, which i know

Sparse-dax is somewhat of a misnomer.  sparse regions may have been a
better name for it.  That is really what we are speaking of.  It is the
idea that we have regions which don't necessarily have memory backing the
size of the region.

For the DCD series I wrote dax devices could only be created after extents
appeared.

> has been a concern, in favor of a per-region-driver policy on how to
> manage hot-add/remove events.

I think a concern would be that each region driver is implementing a
'policy' which requires new drivers for new policies.

My memory is very weak on all this stuff...

My general architecture was trying to exposed the extent ranges to user
space and allow userspace to build them into ranges with whatever policy
they wanted.

The tests[1] were all written to create dax devices on top of the extents
in certain ways to link together those extents.

[1] https://github.com/weiny2/ndctl/blob/dcd-region3-2025-04-13/test/cxl-dcd.sh

I did not like the 'implicit' nature of the association of dax device with
extent.  But it maintained backwards compatibility with non-sparse
regions...

My vision for tags was that eventually dax device creation could have a
tag specified prior and would only allocate from extents with that tag.

> 
> Things I've discussed with folks in different private contexts
> 
> sysram usecase:
> ----
>   echo regionN > decoder0.0/create_dc_region
>   /* configure decoders */
>   echo regionN > cxl/drivers/sysram/bind
> 
> tagged extents arrive and leave as a group, no sparseness
>     extents cannot share a tag unless they arrive together
>     e.g. set(A) & set(B) must have different tags
>     add and expose daxN.M/uuid as the tag for collective management

I'm not following this.  If set(A) arrives can another set(A) arrive
later?

How long does the kernel wait for all the 'A's to arrive?  Or must they be
in a ...  'more bit set' set of extents.

Regardless IMO if user space was monitoring the extents with tag A they
can decide if and when all those extents have arrived and can build on top
of that.

> 
> Can decide whether linux wants to support untagged extents
>     cxl_sysram could choose to track and hotplug untagged extents

'cxl_sysram' is the sysram region driver right?

Are we expecting to have tags and non-taged extents on the same DCD
region?

I'm ok not supporting that.  But just to be clear about what you are
suggesting.

Would the cxl_sysram region driver be attached to the DCD partition?  Then
it would have some DCD functionality built in...  I guess make a common
extent processing lib for the 2 drivers?

I feel like that is a lot of policy being built into the kernel.  Where
having the DCD region driver simply tell user space 'Hey there is a new
extent here' and then having user space online that as sysram makes the
policy decision in user space.

Segwaying into the N_PRIVATE work.  Couldn't we assign that memory to a
NUMA node with N_PRIVATE only memory via userspace...  Then it is onlined
in a way that any app which is allocating from that node would get that
memory.  And keep it out of kernel space?

But keep all that policy in user space when an extent appears.  Not baked
into a particular driver.

>     directly without going through DAX. Partial release would be
>     possible on a per-extent granularity in this case.
> ----
> 
> 
> virtio usecase:  (making some stuff up here)
> ----
>   echo regionN > decoder0.0/create_dc_region
>   /* configure decoders */
>   echo regionN > cxl/drivers/virtio/bind
> 
> tags are required and may imply specific VM routing
>     may or may not use DAX under the hood
> 
> extents may be tracked individually and add/removed individually
>     if using DAX, this implies 1 device per extent.
>     This probably requires a minimum extent size to be reasonable.
> 
> Does not expose the memory as SysRAM, instead builds new interface
>     to handle memory management message routing to/from the VMM
>     (N_MEMORY_PRIVATE?)
> ----
> 
> 
> devdax usecase (FAMFS?)
> ---- 
>   echo regionN > decoder0.0/create_dc_region
>   /* configure decoders */
>   echo regionN > cxl/drivers/devdax/bind
> 
> All sets of extents appear as new DAX devices
> Tags are exposed via daxN.M/uuid
> Tags are required
>    otherwise you can't make sense of what that devdax represents
> ---
> 
> Begs the question:
>    Do we require tags as a baseline feature for all modes?

Previously no.  But I've often thought of no tag as just a special case of
tag == 0.  But we agreed at one time that they would have special no tag
meaning such that it was just memory to be used however...

>    No tag - no service.
>    Heavily implied:  Tags are globally unique (uuid)
> 
> But I think this resolves a lot of the disparate disagreements on "what
> to do with tags" and how to manage sparseness - just split the policy
> into each individual use-case's respective driver.

I think what I'm worried about is where that policy resides.

I think it is best to have a DCD region driver which simply exposes
extents and allows user space to control how those extents are used.  I
think some of what you have above works like that but I want to be careful
baking in policy.

> 
> If a sufficiently unique use-case comes along that doesn't fit the
> existing categories - a new region-driver may be warranted.

Again I don't like the idea of needing new drivers for new policies.  That
goes against how things should work in the kernel.

Ira

Reply via email to