I've run into a problem where it appears that I cannot join more than 14 multicast groups from a single HCA. I'm using the RDMA CM UD/multicast interface from an OFED v1.2 nightly build, and using a '0' address when joining to have the SM allocate an unused address. The first 14 rdma_join_multicast() calls succeed, a MULTICAST_JOIN event comes through for each of them and everything works. But the 15th call to rdma_join_multicast() returns -1 and sets errno to 99, 'Cannot assign requested address'.

Note that I'm using a single QP per process to do all the joins. Things get weirder if I run two instances of my program on the same node -- as soon the total between the two instances is 14, neither instance can join any more groups. Also, right now my code hangs when this happens -- if I kill off one of the two instances and run a third instance (while leaving the other hung, holding some number of groups), the third instance is not able to join ANY groups. The behavior resets when I kill all instances.

Two instances running on separate nodes (on the same network) do not appear to interfere with each other like described above; they do still error out on the 15th join.

This feels like a bug to me; though regardless this limit is WAY too low. Any ideas what might be going on, or how I can work around it?

Andrew
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to