Hal Rosenstock wrote:
I'm not quite parsing what is the same with what is different in the results
(and I presume the only variable is SM).

Yes; this is confusing, I'll try to summarize the various behaviors I'm getting.

First, there are two machines. One has 8 nodes and runs a Topspin switch with the Cisco SM on it. The other is 128 nodes and runs a Mellanox switch with Open SM on a compute node. OFED v1.2 is used on both. Below is how many groups I can join using my test program (described elsewhere in the thread)

On the 8 node machine:
8 procs (one per node) -- 14 groups.
16 procs (two per node) -- 4 groups.

On the 128 node machine:
8 procs (one per node, 8 nodes used) -- 14 groups.
16 procs (two per node, 8 nodes used) -- unlimited? I stopped past 750.

Some peculiarities complicate this. On either machine, I've noticed that if I haven't been doing anything using IB multicast in say a day (haven't tried to figure out exactly how long), in any run scenario listed above, I can join 4 groups. I do a couple runs where I hit errors after 4 groups, and then I consistently get the group counts above for the rest of the work day.

Second, in the cases in which I am able to join 14 groups, if I run my test program twice simultaneously on the same nodes, I am able to join a maximum of 14 groups total between the two running tests (as opposed to 14 per test run). Running the test twice simultaneously using a disjoint set of nodes is not an issue.

This makes me think the switch is involved, is this correct?


I doubt it. It is either end station, SM, or a combination of the two.

OK.

OK this makes sense, but I still don't see where all the time is going.
  Should the fact that the switches haven't been reprogrammed since
leaving the groups really effect how long it takes to do a subsequent
join?  I'm not convinced.


It takes time for the SM to recalculate the multicast tree. While leaves can
be lazy, I forget whether joins are synchronous or not.

Is the algorithm for recalculating the tree documented at all? Or, where is the code for it (assuming I have access)? I feel like I'm missing something here that explains why it's so costly.

Andrew


Is this time being consumed by the switches when the are asked to
reprogram their tables (I assume some sort of routing table is used
internally)?


This is relatively quick compared to the policy for the SM rerouting of
multicast based on joins/leaves/group creation/deletion.

OK.  Thanks for the insight.

Andrew
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to