On Thu, 2015-05-21 at 13:21 -0400, Doug Ledford wrote:
> On Thu, 2015-05-21 at 13:52 +0000, Wan, Kaike wrote:
> > In our previous posting to the mailing list, we proposed to send a MAD 
> > request from kernel (more
> > specifically, from ib_sa module) to a user space application (ibacm in this 
> > case) through netlink.
> > The user space application will send back the response. This simple scheme 
> > can achieve the goal 
> > of a local SA cache in user space.
> > 
> > The format of the request and response is diagrammed below:
> > 
> >   ------------------
> >   | netlink header |
> >   ------------------
> >   |     MAD        |
> >   ------------------
> > 
> > The kernel requests for a pathrecord, and the user application finds it in 
> > its local cache and sends
> > it to the kernel. If the netlink request fails, the kernel will send the 
> > request to SA through the
> > normal IB path (ib_mad -> hca driver -> wire).
> > 
> > Jason pointed out that this message format was limited to lower stack 
> > format (MAD) and its use
> > could not be readily extended to upper layer modules like rdma_cm. After 
> > lengthy discussions, we 
> > come up with a new and modified scheme, as described below.
> > 
> > The general format of the request and response will be the same:
> > 
> >   ------------------
> >   | netlink header |
> >   ------------------
> >   |  Data header   |
> >   ------------------
> >   |      Data      |
> >   ------------------
> > 
> > The data header contains information about the type of request/response, 
> > the status (for response),
> > the type (format) of the data, the total length of the data header + data, 
> > and a flags field about
> > the request/response or data.
> > 
> > Based on the type of the data, the data section may be in different format: 
> > a string about the host
> > name to resolve, an IP4/IP6 address, a pathrecord, a user pathrecord 
> > (struct ib_user_path_rec),
> > or simply a MAD (like our posted patches), etc. Essentially it can be of 
> > any format based on the 
> > data type. The key is to document the format so that the kernel and user 
> > space can communicate 
> > correctly.
> > 
> > The details are described below:
> > 
> > #define IB_NL_VERSION               0x01
> > 
> > #define IB_NL_OP_MASK               0x0F
> > #define IB_NL_OP_RESOLVE    0x01
> > #define IB_NL_OP_QUERY_PATH 0x02
> > #define IB_NL_OP_SET_TIMEOUT        0x03
> > #define IB_NL_OP_ACK                0x80
> 
> If OP_ACK is one bit, why isn't the OP_MASK 0x7f?
> 
> > #define IB_NL_STATUS_SUCCESS        0x0000
> > #define IB_NL_STATUS_ENODATA        0x0001
> 
> Do we need 16 bits for a bool?  In fact, couldn't this actually be
> switched so that the return of the message uses OP_SUCCESS instead of
> OP_ACK?
> 
> In other words, instead of two items here, couldn't the ACK bit be
> dropped entirely and replaced with SUCCESS so that when the user app
> returns the netlink packet, if the op on return == to the op on send, it
> failed, if it's op | SUCCESS, it succeeded?
> 
> > #define IB_NL_DATA_TYPE_INVALID                     0x0000
> > #define IB_NL_DATA_TYPE_NAME                        0x0001
> > #define IB_NL_DATA_TYPE_ADDRESS_IP          0x0002
> > #define IB_NL_DATA_TYPE_ADDRESS_IP6         0x0003
> > #define IB_NL_DATA_TYPE_PATH_RECORD         0x0004
> > #define IB_NL_DATA_TYPE_USER_PATH_REC               0x0005
> > #define IB_NL_DATA_TYPE_MAD                 0x0006
> > 
> > #define IB_NL_FLAGS_PATH_GMP                        1
> > #define IB_NL_FLAGS_PATH_PRIMARY            (1<<1)
> > #define IB_NL_FLAGS_PATH_ALTERNATE          (1<<2)
> > #define IB_NL_FLAGS_PATH_OUTBOUND           (1<<3)
> > #define IB_NL_FLAGS_PATH_INBOUND            (1<<4)
> > #define IB_NL_FLAGS_PATH_INBOUND_REVERSE    (1<<5)
> > #define IB_NL_FLAGS_PATH_BIDIRECTIONAL              (IB_PATH_OUTBOUND | 
> > IB_PATH_INBOUND_REVERSE)
> > #define IB_NL_FLAGS_QUERY_SA                        (1<<31)
> > #define IB_NL_FLAGS_NODELAY                 (1<<30)
> 
> Please keep these in numerical order, don't put <<31 and below it <<30
> 
> > struct ib_nl_data_hdr {
> >     __u8    version;
> >     __u8    opcode;
> >     __u16   status;
> Drop status because we fold it into opcode
> >     __u16   type;
> >     __u16   reserved;
> Drop reserved because we don't need alignment any more
> >     __u32   flags;
> Flags is the only thing using bits fast, and we would want to make this
> header an even 128bits in length, so add a __u32 reserved; here.  That's
> more likely to be useful than the current layout since we are likely to
> run out of flags before anything else.
> >     __u32   length;
> > };
> > 
> > struct ib_nl_data {
> >     struct ib_nl_data_hdr           hdr;
> >     __u8                            data[0];
> > };
> > 
> > 
> > These defines and structures can be added to file 
> > include/upai/rdma/rdma_netlink.h (replace with
> > RDMA_NL prefix) or contained in a seperate file 
> > (include/upai/rdma/ib_netlink.h ???). 
> > 
> > Please share your thoughts.
> 
> I think an extensible netlink framework here is the right way to go,
> certainly better than the one shot method you had first.

The one thing I left out of the above that might be worth changing is
the fact that you bury your sequence number down in your mad header.  If
there is a generic mechanism that multiple modules can use to send
customized data via nl, then it might be worthwhile to have the sequence
moved to the generic level.


-- 
Doug Ledford <[email protected]>
              GPG KeyID: 0E572FDD

Attachment: signature.asc
Description: This is a digitally signed message part

Reply via email to