Core: MEM_MGT_EXTENSIONS support

Steve Wise Mon, 19 May 2008 06:40:41 -0700

Or Gerlitz wrote:

Steve Wise wrote:
Support for the IB BMME and iWARP equivalent memory extensions to nonshared memory regions. Usage Model:
- MR allocated with ib_alloc_mr()
- Page lists allocated via ib_alloc_fast_reg_page_list().
- MR made VALID and bound to a specific page list viaib_post_send(IB_WR_FAST_REG_MR)
- MR made INVALID via ib_post_send(IB_WR_INVALIDATE_MR)
- MR deallocated with ib_dereg_mr()
- page lists dealloced via ib_free_fast_reg_page_list().
Steve,
Does this design goes hand-in-hand with remote invalidation? such thatif the remote side invalidated the mapping there no need to issue theIB_WR_INVALIDATE_MR work request.


Yes.

Also, does the proposed design support fmr pages of granularitydifferent than the OS ones? for example the OS pages are 4K and theULP wants to use fmr of 512 byte "pages (the "block lists" feature),etc. In that case doesn't the size of each page has to be specified inas a param to the alloc_fast_reg_mr() verb?

Page size is passed in at the registration time. At allocation time,the HW only need to know what the max page list length (or PBL depth)will ever be so it can pre-allocate that at alloc time. The the actualypage list length, the page size of each entry in the page list, as wellas the page list itself is passed in via thepost_send(IB_WR_FAST_REG_MR) work request. See the fast_reg union instruct ib_send_wr.


Applications can allocate a fast_reg mr once, and then can repeatedly
bind the mr to different physical memory SGLs via posting work requests
to the send queue.  For each outstanding mr-to-pbl binding in the SQ
pipe, a fast_reg_page_list needs to be allocated.  Thus pipelining can
be achieved while still allowing device-specific page_list processing.

mmm, is it a must for the ULP issue page list alloc/free perIB_WR_FAST_REG_MR call?

No, the can be reused as needed. They typically will only get allocatedonce, used many times, then freed when the application is done. Mypoint in the text above was that an application could allocate N pagelists and use them in a pipeline for the same fast reg mr by fencingthings appropriately in the SQ.

--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -676,6 +683,20 @@ struct ib_send_wr {
             u16    pkey_index; /* valid for GSI only */
             u8    port_num;   /* valid for DR SMPs on switch only */
         } ud;
+        struct {
+            u64                iova_start;
+            struct ib_mr             *mr;
+            struct ib_fast_reg_page_list    *page_list;
+            unsigned int            page_size;
+            unsigned int            page_list_len;
+            unsigned int            first_byte_offset;
+            u32                length;
+            int                access_flags;

++ } fast_reg;

+        struct {
+            struct ib_mr     *mr;
+        } local_inv;
     } wr;
 };

I suggest to use a "page_shift" notation and not "page_size" to complywith the kernel semantics of other APIs.

Ok, I wondered about that.  It will also ensure a power of two.

Steve.
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [ofa-general] [PATCH RFC v3 1/2] RDMA/Core: MEM_MGT_EXTENSIONS support

Reply via email to