Anisa Su wrote:
> On Sun, Apr 13, 2025 at 05:52:20PM -0500, Ira Weiny wrote:i
> A few notes while going through and removing sparse dax semantics and plumbing
> for fs-dax mode:
> > A dynamic capacity device (DCD) sends events to signal the host for
> > changes in the availability of Dynamic Capacity (DC) memory.  These
> > events contain extents describing a DPA range and meta data for memory
> > to be added or removed.  Events may be sent from the device at any time.
> > 
> > Three types of events can be signaled, Add, Release, and Force Release.
> > 
> > On add, the host may accept or reject the memory being offered.  If no
> > region exists, or the extent is invalid, the extent should be rejected.
> > Add extent events may be grouped by a 'more' bit which indicates those
> > extents should be processed as a group.
> > 
> > On remove, the host can delay the response until the host is safely not
> > using the memory.  If no region exists the release can be sent
> > immediately.  The host may also release extents (or partial extents) at
> > any time.
> Partial release is no longer valid for tagged release iirc from the calls

Tags were not supported in this version:

        if (!uuid_is_null((const uuid_t *)extent->uuid)) {
                dev_err_ratelimited(dev,
                                    "DC extent DPA %pra (%pU); tags not 
supported\n",
                                    &ext_range, extent->uuid);
                return -ENXIO;
        }

> 
> > Thus the 'more' bit grouping of release events is of less
> > value and can be ignored in favor of sending multiple release capacity
> > responses for groups of release events.
> > 
> [snip]
> > +
> > +static int cxl_send_dc_response(struct cxl_memdev_state *mds, int opcode,
> > +                           struct xarray *extent_array, int cnt)
> > +{
> > +   struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
> > +   struct cxl_mbox_dc_response *p;
> > +   struct cxl_extent *extent;
> > +   unsigned long index;
> > +   u32 pl_index;
> > +
> > +   size_t pl_size = struct_size(p, extent_list, cnt);
> > +   u32 max_extents = cnt;
> > +
> > +   /* May have to use more bit on response. */
> > +   if (pl_size > cxl_mbox->payload_size) {
> > +           max_extents = (cxl_mbox->payload_size - sizeof(*p)) /
> > +                         sizeof(struct updated_extent_list);
> > +           pl_size = struct_size(p, extent_list, max_extents);
> > +   }
> > +
> > +   struct cxl_mbox_dc_response *response __free(kfree) =
> > +                                           kzalloc(pl_size, GFP_KERNEL);
> > +   if (!response)
> > +           return -ENOMEM;
> > +
> > +   if (cnt == 0)
> > +           return send_one_response(cxl_mbox, response, opcode, 0, 0);
> > +
> > +   pl_index = 0;
> I was wondering why xarray is used here instead of a list? I didn't see 
> anywhere
> that we need to look up a specific index to benefit from the log complexity 
> and
> afaict, simply used to iterate over all elements.

xarray was just easier than a list.

> 
> > +   xa_for_each(extent_array, index, extent) {
> > +           response->extent_list[pl_index].dpa_start = extent->start_dpa;
> > +           response->extent_list[pl_index].length = extent->length;
> > +           pl_index++;
> > +
> > +           if (pl_index == max_extents) {
> > +                   u8 flags = 0;
> > +                   int rc;
> > +
> > +                   if (pl_index < cnt)
> > +                           flags |= CXL_DCD_EVENT_MORE;
> > +                   rc = send_one_response(cxl_mbox, response, opcode,
> > +                                          pl_index, flags);
> > +                   if (rc)
> > +                           return rc;
> > +                   cnt -= pl_index;
> > +                   pl_index = 0;
> > +           }
> > +   }
> > +
> > +   if (!pl_index) /* nothing more to do */
> > +           return 0;
> > +   return send_one_response(cxl_mbox, response, opcode, pl_index, 0);
> > +}
> > +

[snip]

> > +static int validate_add_extent(struct cxl_memdev_state *mds,
> > +                          struct cxl_extent *extent)
> > +{
> > +   int rc;
> > +
> > +   rc = cxl_validate_extent(mds, extent);
> > +   if (rc)
> > +           return rc;
> > +
> > +   return cxl_add_extent(mds, extent);
> > +}
> > +
> > +static int cxl_add_pending(struct cxl_memdev_state *mds)
> > +{
> > +   struct device *dev = mds->cxlds.dev;
> > +   struct cxl_extent *extent;
> > +   unsigned long cnt = 0;
> > +   unsigned long index;
> > +   int rc;
> > +
> Also according to the spec:
> "In response to an Add Capacity Event Record, or multiple Add Capacity Event 
> records grouped via the More flag (see Table 8-229), the host is expected to 
> respond with exactly one Add Dynamic Capacity Response acknowledgment, 
> corresponding to the order of the Add Capacity Events received. If the order 
> does not match, the device shall return Invalid Input. The Add Dynamic 
> Capacity
> Response acknowledgment must be sent in the same order as the Add Capacity 
> Event Records."

hmmm...  yea that might be wrong, I don't recall.

> 
> Using xarray does not preserve the order of the extents, which requires a fifo
> queue.

It could if the index was the order.

But in the end I'm not opposed to using a list.

Ira

[snip]

Reply via email to