> On Thu, May 21, 2015 at 07:52:36AM -0600, Wan, Kaike wrote:
> >
> > The general format of the request and response will be the same:
> >
> >   ------------------
> >   | netlink header |
> >   ------------------
> >   |  Data header   |
> >   ------------------
> >   |      Data      |
> >   ------------------
> >
> > The data header contains information about the type of
> > request/response, the status (for response), the type (format) of the
> > data, the total length of the data header + data, and a flags field about 
> > the
> request/response or data.
> >
> > Based on the type of the data, the data section may be in different
> > format: a string about the host name to resolve, an IP4/IP6 address, a
> > pathrecord, a user pathrecord (struct ib_user_path_rec), or simply a MAD
> (like our posted patches), etc.
> 
> I think given the new plans this is not really necessary.

Why not? Communication of the ib_mad/ib_sa layer with user space might find 
using MAD format more natural. It all depends  on where  the kernel component 
is, or the format of the input data that are available to it.

> 
> >
> > Essentially it can be of any format based on the data type. The key is
> > to document the format so that the kernel and user space can
> > communicate correctly.
> >
> > The details are described below:
> >
> > #define IB_NL_VERSION               0x01
> 
> Change all "IB" to "RDMA"

Will do.

> 
> >
> > #define IB_NL_OP_MASK               0x0F
> > #define IB_NL_OP_RESOLVE    0x01
> > #define IB_NL_OP_QUERY_PATH 0x02
> > #define IB_NL_OP_SET_TIMEOUT        0x03
> > #define IB_NL_OP_ACK                0x80
> >
> > #define IB_NL_STATUS_SUCCESS        0x0000
> > #define IB_NL_STATUS_ENODATA        0x0001
> 
> If we do what Doug suggested should we just make OP 16bits with the high
> bit ACK/NACK?
> 
> Then use the other u8 as a detailed status if needed.
In that case,  Use :

   __u8 version;
   __u8 status;
  __u16 opcode;

We will have more opcode space. 

> 
> >
> > #define IB_NL_DATA_TYPE_INVALID                     0x0000
> > #define IB_NL_DATA_TYPE_NAME                        0x0001
> > #define IB_NL_DATA_TYPE_ADDRESS_IP          0x0002
> > #define IB_NL_DATA_TYPE_ADDRESS_IP6         0x0003
> > #define IB_NL_DATA_TYPE_PATH_RECORD         0x0004
> > #define IB_NL_DATA_TYPE_USER_PATH_REC               0x0005
> 
> Do we need both PATH_RECORD and USER_PATH_REC?

With subheader definition (In my response to Jason's comments), we need only 
PATH_RECORD type. I will update my RFC.

> 
> I'm having trouble determining when the OP == QUERY_PATH and the
> DATA_TYPE != PATH_RECORD.
> 
> Why don't we remove "QUERY_PATH" above and allow OP == RESOLVE and
> DATA_TYPE == PATH_RECORD be a "query for path record"?

TRUE.

> 
> > #define IB_NL_DATA_TYPE_MAD                 0x0006
> 
> I would drop this for now.
> 
> >
> > #define IB_NL_FLAGS_PATH_GMP                        1
> > #define IB_NL_FLAGS_PATH_PRIMARY            (1<<1)
> > #define IB_NL_FLAGS_PATH_ALTERNATE          (1<<2)
> > #define IB_NL_FLAGS_PATH_OUTBOUND           (1<<3)
> > #define IB_NL_FLAGS_PATH_INBOUND            (1<<4)
> > #define IB_NL_FLAGS_PATH_INBOUND_REVERSE    (1<<5)
> > #define IB_NL_FLAGS_PATH_BIDIRECTIONAL
>       (IB_PATH_OUTBOUND | IB_PATH_INBOUND_REVERSE)
> > #define IB_NL_FLAGS_QUERY_SA                        (1<<31)
> > #define IB_NL_FLAGS_NODELAY                 (1<<30)
> >
> > struct ib_nl_data_hdr {
> >     __u8    version;
> >     __u8    opcode;
> >     __u16   status;
> >     __u16   type;
> >     __u16   reserved;
> >     __u32   flags;
> >     __u32   length;
> 
> The overall message length is in the netlink header.  So keeping in mind
> Jasons comments regarding returning multiple data records.

With this length field, a netlink package could carry multiple 
requests/responses (opcodes). Each request/response could carry multiple data 
sections.

> 
> I think this should be the length of individual records with a "num records"
> also specified

With data sections, each section may have different length. Within a data 
section, when all records have the same size, we can carry the number of 
records field. 

.  I would much prefer this over the yucky IBTA RMPP method
> of implicit record sizes needing to be divided into the overall message size.
> 
> > };
> 
> Should we have a "class" value in the header somewhere?  With multiple
> user space listeners it could be easier to mux messages with such a value.
> The class/seq would then differentiate the message.

Then use a different multicast group. It does not make sense to have multiple 
listeners for the same multicast group and they all can serve the same request.

> 
> >
> > struct ib_nl_data {
> >     struct ib_nl_data_hdr           hdr;
> >     __u8                            data[0];
> > };
> >
> >
> > These defines and structures can be added to file
> > include/upai/rdma/rdma_netlink.h (replace with RDMA_NL prefix) or
> contained in a seperate file (include/upai/rdma/ib_netlink.h ???).
> 
> This is not as important as getting the protocol down.

Minor.

> 
> Ira

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to