Sean Hefty wrote:
4) rdma_ucm gets this event and dutifully posts it for the use app to
reap. But since the app doesn't reap this event and exit or at least
destroy the cm id, nothing else happens.
For the rdma_ucm, it should post the event, but destroy the underlying
rdma_cm_id (possibly by returning non-zero from the remove callback or from
another thread). The only call that the rdma_ucm will succeed from user space
at that point is destroy. State checking and synchronization would need to be
used to mark that the kernel id has already been freed.
We just need to ensure that the rdma_ucm doesn't try to destroy an id that is in
another downcall, and I think the synchronization will be non-trivial.
In addition I think there is an assumption in the rdma_ucm that the
underlying rdma_cm_id exists whenever the ucma context is still valid.
We might need some state in the ucma context that sez "no rdma_cm_id
exists". Then all the ucma code will have to check this before
utilizing the rdma_cm_id. Maybe just checking the ctx->cm_id pointer is
sufficient.
In other words, I think we want the ucma context to stay around until
the application destroys it (via explicit means or via exit). But the
rdma_cm_id gets destroyed immediately upon receiving a DEVICE_REMOVE event.
Steve.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html