Sean Hefty wrote:
and in concept I prefer to:
* Always report the event and let ULPs ignore it
* Let someone come up with a fantastically simple way of reporting new events
I am fine with the approach of always report the event and let ULPs ignore it. Looking on how the ABI versions are exchanged between the rdma_ucm module to librdmacm, I don't see much alternatives other to bumping the ABI version to five. If librdmacm can somehow note against what ABI version the app was built, we could bump the ABI version to five and require the user to upgrade his librdmacm to be able to run, but have --librdmacm-- hide this event from the user in case "his version" of the ABI is smaller.

I spent most of the morning looking at this, and until I know what the
trade-offs really are in the implementation, I can't say that I have a strong
preference for how to deal with any of this.  My main concerns are:

* All callbacks from the rdma_cm are serialized
* We minimize the overhead of reporting events
* We don't lose events
* If the user returns a non-zero value from a callback, the rdma_cm_id is
  destroyed, an no further callbacks are invoked.
Thanks for looking into that. Yes, I think its correct and fair to require that all these characteristics would remain also after merging the new event.
The existing rdma_cm callbacks are naturally serialized with each other.
(Callback for connect after resolve route after resolve address...)  This allows
using the stack for event structures, but the cost is complex synchronization
with device removal.  Supporting additional events while meeting the concerns
listed above will be equally challenging.  So if we can simplify device removal
handling, then supporting similar types of events should be easier as well.

If we can guarantee that this works, one option is to acquire a mutex before
invoking a callback on an rdma_cm_id.  I hesitate to hold any locks while in a
callback, since it restricts what the user can do, but if the mutex is only used
to synchronize calling the user back, it may work, since the rdma_cm never
invokes a callback from a downcall.  This should simplify the device removal
handling, eliminating wait_remove and dev_remove from the rdma_cm_id.
I would like to look into this possibility which as you stated later in your post is simpler compared to the alternatives and would also make the current code of supporting device removal less complex. So can/should that mutex be the existing one defined in cma.c or a new one?

Or

_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to