Re: [ofa-general] Limited number of multicasts groups that can be joined?

Sean Hefty Thu, 28 Jun 2007 14:23:08 -0700

Now the more interesting part. I'm now able to run on a 128 nodemachine using open SM running on a node (before, I was running on an 8node machine which I'm told is running the Cisco SM on a Topspinswitch). On this machine, if I run my benchmark with two processes pernode (instead of one, i.e. mpirun -np 16 with 8 nodes), I'm able to join> 750 groups simultaneously from one QP on each process. To make thisstranger, I can join only 4 groups running the same thing on the 8-nodemachine.

Are the switches and HCAs in the two setups the same? If you run thesame SM on both clusters, do you see the same results?

While doing so I noticed that the time from callingrdma_join_multicast() to the event arrival stayed fairly constant (inthe .001sec range), while the time from the join call to actuallyreceiving messages on the group steadily increased from around .1 secsto around 2.7 secs with 750+ groups. Furthermore, this time does notdrop back to .1 secs if I stop the benchmark and run it (or any of myother multicast code) again. This is understandable within a singleprogram run, but the fact that behavior persists across runs concerns me-- feels like a bug, but I don't have much concrete here.

Even after all nodes leave all multicast groups, I don't believe thatthere's a requirement for the SA to reprogram the switches immediately.So if the switches or the configuration of the swtiches are part ofthe problem, I can imagine seeing issues between runs.

When rdma_join_multicast() reports the join event, it means either: theSA has been notified of the join request, or, if the port has alreadyjoined the group, that a reference count on the group has beenincremented. The SA may still require time to program the switchforwarding tables.


- Sean
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [ofa-general] Limited number of multicasts groups that can be joined?

Reply via email to