On Tue, Nov 30, 2010 at 04:09:31PM +0200, Nir Muchtar wrote:
> > struct ib_nl_qp
> > {
> > // String names for IB devices was a mistake, don't perpetuate it.
>
> I don't know of an IB device index mapping like the one in netdevice.
> Am I missing one? Do you mean we should create one?
Yes, definately. It is very easy to do and goes hand-in-hand with the
typical netlink protocol design.
> > __u32 ib_dev_if;
> > __u32 qp_num;
> > __s32 creator_pid; // -1 for kernel consumer
> > };
> >
> > struct ib_nl_qp_cm_info
> > {
> > u32 cm_state; // enum ib_cm_state
> > u32 lap_state;
> > };
> >
> > struct ib_nl_rdma_cm_info
> > {
> > __u32 bound_dev_if;
> > __u32 family;
> > __u32 cm_state; // enum rdma_cm_state
> > };
> >
> > This captures more information and doesn't tie things to RDMA_CM.
>
> My problem with making everything QP related is that not everything
> necessarily is. For example, when creating rdma cm id's they are still
> not bound to QP's. I guess you could send all zeros in this case, but as
> more and more of such exceptions are needed this framework will become a
> bit unnatural. The current implementation is not tying anything to RDMA
> CM. It allows other modules to export data exactly the way they need.
Well, I was outlining how I think the QP-centric information can be
returned. You are right that we also have non-QP info, like listening
objects, and I think that can be best returned with a seperate
query. Trying to conflate them seems like it would be
trouble. Certainly, as I've described IBNL_QP messages should only
refer to active QPs.
Remember you can have as many queries as you like, this is just the QP
object query.
I guess an alternative would be to have many tables: RDMA_CM, QP, and
IB_CM and then rely on userspace to 'join' them by ifindex+QPN - but
that seems like alot of work in userspace and I think pretty much
everyone is going to want to have the joined data.
> > > +static int cma_get_stats(struct sk_buff **nl_skb, int pid)
> >
> > You really have to use netlink_dump_start here, doing it like this
> > will deadlock. See how other places use NLM_F_DUMP as well.
>
> Well, I already reviewed netlink_dump_start, and this is how it works
> as much as I can see (Please correct me if I'm wrong):
> 1. Allocates an skb
> 2. Calls its supplied dump cb
> 3. Calls its supplied done cb if if applicable
> 4. Appends NLMSG_DONE
No, this isn't quite right. The dumpcb is also called after userspace
calls recvmsg(), which continues the dump once the buffer is
freed. The idea is to return a bit of the table on every dump call
back.
The way it is used is:
1. Userspace does send()
2. Kernel calls netlink_dump_start()
3. netlink_dump_start calls callback which returns non-zero
4. send() returns in userspace
5. Userspace does recv()
6. Kernel copies the data from #3 into userspace
7. netlink_dump calls callback which returns non-zero
8. recv() returns in userspace
[...]
> Thanks again for all your input.
Thanks for working on this!
Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html