Sean,
Great that you are taking this on!  I will review this next week.

-Jack

On Tuesday 17 May 2011 00:13, Hefty, Sean wrote:
> I've been working on a set of XRC patches aimed at upstream inclusion to the 
> kernel, libibverbs, and librdmacm.  I'm using existing patches as the major 
> starting point.  A goal is to maintain the user space ABI.  Before proceeding 
> further, I wanted to get broader feedback.  Starting at the top and working 
> down, these are the basic ideas:
> 
> 
> librdmacm
> ---------
> The API is basically unchanged.  XRC usage is indicated through the QP type.  
> The challenge is determining if XRC maps to a specific rdma_port_space.
> 
> 
> libibverbs
> ----------
> We define a new device capability flag IBV_DEVICE_EXT_OPS, indicating that 
> the library supports extended operations.  If set, the provider library 
> returns an extended structure from ibv_open_device():
> 
>       struct ibv_context_ext {
>               struct ibv_context context;
>               int                version;
>               struct ibv_ext_ops ext_ops;
>       };
> 
> The ext_ops will allow for additional operations not provided by 
> ibv_context_ops, for example:
> 
>       struct ibv_ext_ops {
>               int     (share_pd)(struct ibv_pd *pd, int fd, int oflags);
>       };
> 
> In order for libibverbs to check for ext_ops support, it steals a byte from 
> the device name:
> 
>       /*
>        * Support for extended operations is recorded at the end of
>        * the name character array.
>        */
>       #define ext_ops_cap            name[IBV_SYSFS_NAME_MAX - 1]
> 
> (If strlen(name) indicates that this byte terminates the string, extended 
> operation support is disabled for this device.)
> 
> Hopefully, this provides the framework needed for libibverbs to support both 
> old and new provider libraries.
> 
> From an architecture viewpoint, XRC adds 4 new XRC specific objects: domains, 
> INI QPs, TGT QPs, and SRQs.  For the purposes of the libibverbs API only, I'm 
> suggesting the following mappings:
> 
> XRC domains - Hidden under a PD, dynamically allocated when needed.  An 
> extended ops call allows the xrcd to be shared between processes.  This 
> minimizes changes to existing structures and APIs which only take a struct 
> ibv_pd.
> 
> INI QPs - Exposed through a new IBV_QPT_XRC_SQ qp type.  This is a send-only 
> QP with minimal differences from an RC QP from a user's perspective.
> 
> TGT QPs - Not exposed to user space.  XRC TGT QP creation and setup is 
> handled by the kernel.
> 
> XRC SRQs - Exposed through a new IBV_QPT_XRC_RQ qp type.  This is an SRQ that 
> is tracked using a struct ibv_qp.  This minimizes API changes to both 
> libibverbs and librdmacm.
> 
> If ext_ops are supported and in active use, extended structures may be 
> expected with some calls, such as ibv_post_send() requiring a struct 
> ibv_xrc_send_wr for XRC QPs.
> 
>       struct ibv_xrc_send_wr {
>               struct ibv_send_wr wr;
>               uint32_t remote_qpn;
>       };
> 
> 
> uverbs
> ------
> (Ideas for kernel changes are sketchier, but the existing patches cover most 
> of the functionality except for IB CM interactions.)
> 
> Need new uverbs commands to support alloc/dealloc xrcd and create xrc srq.  
> Create QP must handle XRC INI QPs.  XRC TGT QPs are not exposed; ***all XRC 
> INI->TGT QP setup is done in band***.
> 
> Somewhere, an xrc sub-module listens on a SID and accepts incoming XRC 
> connection requests.  This requires associating the xrcd and SID, the details 
> of which I'm not clear on.  The xrcd is most readily known to uverbs, but a 
> SID is usually formed by the rdma_cm.  Even how the latter is done is unclear.
> 
> The usage model I envision is for a user to call listen on an XRC SRQ 
> (IBV_QPT_XRC_RQ), which listens for a SIDR REQ to resolve the SRQN and a REQ 
> to setup the INI->TGT QPs.  The issue is sync'ing the lifetime of any formed 
> connections with the xrcd.
> 
> 
> verbs
> -----
> The patch for this is basically available.  3 new calls are added: 
> ib_create_xrc_srq, ib_alloc_xrcd, and ib_dealloc_xrcd.  The IB_QPT_XRC is 
> split into 2 types: IB_QPT_INI_XRC and IB_QPT_TGT_XRC.  An INI QP has a pd, 
> but no xrcd, while the TGT QP is the reverse.
> 
> 
> Existing patches to the mlx4 driver and library would be modified to handle 
> these changes.  If anyone has any thoughts on these changes, I'd appreciate 
> them before I have them implemented.  :)
> 
> - Sean
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to [email protected]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to