On Tue, 2010-11-30 at 11:19 -0700, Jason Gunthorpe wrote:

> > I don't know of an IB device index mapping like the one in netdevice.
> > Am I missing one? Do you mean we should create one?
> 
> Yes, definately. It is very easy to do and goes hand-in-hand with the
> typical netlink protocol design.

I agree, but this is a bit out of scope for the current patches and I
think this kind of change should be given some thought. It needs to
supply userspace with mapping functions and I don't think it will be
that easy to complete. The patch in its current state uses names but it
doesn't perpetuate their use because the rdma cm export is separate from
the infrastructure. Once we have such an ability, it will be very easy
to use here.

> Well, I was outlining how I think the QP-centric information can be
> returned. You are right that we also have non-QP info, like listening
> objects, and I think that can be best returned with a seperate
> query. Trying to conflate them seems like it would be
> trouble. Certainly, as I've described IBNL_QP messages should only
> refer to active QPs.
> 
> Remember you can have as many queries as you like, this is just the QP
> object query.
> 
> I guess an alternative would be to have many tables: RDMA_CM, QP, and
> IB_CM and then rely on userspace to 'join' them by ifindex+QPN - but
> that seems like alot of work in userspace and I think pretty much
> everyone is going to want to have the joined data.

So we are in agreement that more then one export type is required here.
I do agree that your suggestion will make sense once we try to export QP
related data, so maybe we can agree that I will fully support such a
scheme, so it will be easy to implement later. By that I mean that the
infrastructure will allow adding arbitrary attributes to messages (in
type and in size). What do you think?

> No, this isn't quite right. The dumpcb is also called after userspace
> calls recvmsg(), which continues the dump once the buffer is
> freed. The idea is to return a bit of the table on every dump call
> back.
> 
> The way it is used is:
>  1. Userspace does send()
>  2. Kernel calls netlink_dump_start()
>  3. netlink_dump_start calls callback which returns non-zero
>  4. send() returns in userspace
>  5. Userspace does recv()
>  6. Kernel copies the data from #3 into userspace
>  7. netlink_dump calls callback which returns non-zero
>  8. recv() returns in userspace

Yes that's correct, but inet_diag takes care of the last two steps by
updating its cb index, and not dump_start. If we use it that way we can
have problems with changes in data structure on subsequent recv calls,
so if we want to keep it the same we would still need to employ locking.
I don't see a way to keep the same data without locking and without a
session mechanism of some sort.

Nir

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to