On Mon, 2010-11-29 at 12:11 -0700, Jason Gunthorpe wrote:
> On Mon, Nov 29, 2010 at 06:16:39PM +0200, Nir Muchtar wrote:
> > Add callbacks and data types for statistics export.
> > One callback is implemented that exports all of the current devices/ids.
> > Add/remove the callback to IB Netlink on init/cleanup.
>
> Please include the schema for the messages you are adding to netlink
> in the comment so other people can review them easily.
>
> Looks to me like you have messages of the form:
>
> {
> [IBNL_RDMA_CM_DEVICE_NAME - char[]]
> [IBNL_RDMA_CM_ID_STATS - struct rdma_cm_id_stats]*
> }*
>
Yes that's the basic structure. I'll add an explanation next time.
> As I've said before, I don't want to see this tied to RDMA_CM, that is
> not general enough for a new userspace API. The use of
> IBNL_RDMA_CM_DEVICE_NAME is very un-netlink-like and is just an ugly
> hack to avoid addressing that problem.
This is done to save space in the netlink messages.
I am open for ideas for improvements.
I thought of another possibility: We can make another op for rdma
devices only with index mapping. This could create a problems if a
device is added/removed between calls though.
See my question about your suggestion below.
>
> How about messages of the form:
> {
> IBNL_QP - struct ib_nl_qp
> IBNL_QP_ATTR - struct ib_qp_attr (or a reasonable subset)
> IBNL_QP_CM_INFO u8[] - struct ib_nl_qp_cm_info
> IBNL_QP_CM_SERVICE_ID u8[]
> IBNL_RDMA_CM_INFO - struct ib_nl_rdma_cm_info
> IBNL_RDMA_CM_SRC - u8[] // This is a sockaddr_*
> IBNL_RDMA_CM_DST - u8[]
> }+
>
> Meaning there is an array of IBNL_QP messages which contains various
> attributes. Similar to how everything else in netlink works.
>
> struct ib_nl_qp
> {
> // String names for IB devices was a mistake, don't perpetuate it.
I don't know of an IB device index mapping like the one in netdevice.
Am I missing one? Do you mean we should create one?
> __u32 ib_dev_if;
> __u32 qp_num;
> __s32 creator_pid; // -1 for kernel consumer
> };
>
> struct ib_nl_qp_cm_info
> {
> u32 cm_state; // enum ib_cm_state
> u32 lap_state;
> };
>
> struct ib_nl_rdma_cm_info
> {
> __u32 bound_dev_if;
> __u32 family;
> __u32 cm_state; // enum rdma_cm_state
> };
>
> This captures more information and doesn't tie things to RDMA_CM.
My problem with making everything QP related is that not everything
necessarily is. For example, when creating rdma cm id's they are still
not bound to QP's. I guess you could send all zeros in this case, but as
more and more of such exceptions are needed this framework will become a
bit unnatural. The current implementation is not tying anything to RDMA
CM. It allows other modules to export data exactly the way they need.
>
> > +static int cma_get_stats(struct sk_buff **nl_skb, int pid)
>
> You really have to use netlink_dump_start here, doing it like this
> will deadlock. See how other places use NLM_F_DUMP as well.
Well, I already reviewed netlink_dump_start, and this is how it works
as much as I can see (Please correct me if I'm wrong):
1. Allocates an skb
2. Calls its supplied dump cb
3. Calls its supplied done cb if if applicable
4. Appends NLMSG_DONE
This appears to be executed synchronously, within the context of the
calling thread. So I couldn't figure out how to use it for avoiding long
locking times.
Anyway, what I tried to achieve is a mechanism that allocates more skb's
as they are needed, and separate it from the calling module. Do you know
of an inherent way to make netlink_dump_start to do that?
>
> > +struct rdma_cm_id_stats {
> > + enum rdma_node_type nt;
> > + int port_num;
> > + int bound_dev_if;
> > + __be16 local_port;
> > + __be16 remote_port;
> > + __be32 local_addr[4];
> > + __be32 remote_addr[4];
> > + enum rdma_port_space ps;
> > + enum rdma_cm_state cm_state;
> > + u32 qp_num;
> > + pid_t pid;
> > +};
>
> Putting enums in a user/kernel structure like this makes me nervous
> that we'll have a 32/64 bit problem. It would be more consistent with
> the uverbs stuff to use explicit fixed width types.
Yes you're right. Also, I see now that this is not normally done this
way, so I'll drop the enums.
>
> Jason
Thanks again for all your input.
Nir
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html