Michael S. Tsirkin wrote:
I did followed most of the discussions between you and MoniS re the
ipoib/bonding integration in OFED 1.2 and elsewhere, however: i don't
see why "bonding is basically broken for ipoib", if you don't mind,
please tell me the bottom line from your perspective.
Here's a short summary of issues I saw last time, I'm not sure
I haven't forgot something but here goes:
1.Calling to_ipoib_neigh without device lock taken might be racy
I think you need to find another way to find the device.
2.Ah kept in the ipoib_neigh might belong to a device which is different
from the one start_xmit is called at.
3.When the slave device goes down, master does not, and since
neighbours are matched to the master there's no guarantee they will be
cleaned up.
4.Bonding module copies a pointer to the cleanup function in a manner
that is unsafe if ipoib is built as a module.
I think these need to be addressed somehow before the patch's reposted.
Michael, Roland,
Following the high-availability/bonding session at Sonoma, I'd like to
have a BOF to discuss the issues which from your perspective should be
addressed before the patch set is merged upstream. Will you be around?
Now, for 1,3,4 above i am quite confident to understand what Michael is
saying, on what we agree and on what not...
I just need a clarification on (2), can you educate me how can it
happen? looking on the code (eg the below chains) my understanding is
that address handles and struct ipoib_neigh are allocated 1:1
Or.
ipoib_neigh_alloc <-- neigh_add_path <-- ipoib_path_lookup <--
ipoib_start_xmit
ipoib_neigh_alloc <-- ipoib_mcast_send <-- ipoib_path_lookup <--
ipoib_start_xmit
ipoib_neigh_alloc <-- ipoib_mcast_send <-- ipoib_start_xmit
ipoib_create_ah <-- ipoib_mcast_join_finish
ipoib_create_ah <-- path_rec_completion
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general