Andrew Friedley wrote:
Hal Rosenstock wrote:
I'm not quite parsing what is the same with what is different in the results
(and I presume the only variable is SM).

Yes; this is confusing, I'll try to summarize the various behaviors I'm getting.

First, there are two machines. One has 8 nodes and runs a Topspin switch with the Cisco SM on it. The other is 128 nodes and runs a Mellanox switch with Open SM on a compute node. OFED v1.2 is used on both. Below is how many groups I can join using my test program (described elsewhere in the thread)

On the 8 node machine:
8 procs (one per node) -- 14 groups.
16 procs (two per node) -- 4 groups.

On the 128 node machine:
8 procs (one per node, 8 nodes used) -- 14 groups.
16 procs (two per node, 8 nodes used) -- unlimited? I stopped past 750.

Some peculiarities complicate this. On either machine, I've noticed that if I haven't been doing anything using IB multicast in say a day (haven't tried to figure out exactly how long), in any run scenario listed above, I can join 4 groups. I do a couple runs where I hit errors after 4 groups, and then I consistently get the group counts above for the rest of the work day.

Second, in the cases in which I am able to join 14 groups, if I run my test program twice simultaneously on the same nodes, I am able to join a maximum of 14 groups total between the two running tests (as opposed to 14 per test run). Running the test twice simultaneously using a disjoint set of nodes is not an issue.

So I sent that last email before I meant to :) Need to eat.. I've managed to confuse my self a little here too -- it looks like changing from the Cisco SM to the OpenSM did not change behavior on the 8 node machine. At least, I'm still getting the same results above now that it's back on the Cisco SM.

Also some newer results. I had a long run going on the 128 node machine to see how many groups I really could join, and it just errored out after joining 892 groups successfully. Specifically, I got an RDMA_CM_EVENT_MULTICAST_ERROR event containing status -22 ('Unknown error' according to sterror). errno is still cleared to 'Success'. I don't have time go look at the code to see where this came from right now, but does anyone know what it means?

Andrew


This makes me think the switch is involved, is this correct?


I doubt it. It is either end station, SM, or a combination of the two.

OK.

OK this makes sense, but I still don't see where all the time is going.
  Should the fact that the switches haven't been reprogrammed since
leaving the groups really effect how long it takes to do a subsequent
join?  I'm not convinced.


It takes time for the SM to recalculate the multicast tree. While leaves can
be lazy, I forget whether joins are synchronous or not.

Is the algorithm for recalculating the tree documented at all? Or, where is the code for it (assuming I have access)? I feel like I'm missing something here that explains why it's so costly.

Andrew


Is this time being consumed by the switches when the are asked to
reprogram their tables (I assume some sort of routing table is used
internally)?


This is relatively quick compared to the policy for the SM rerouting of
multicast based on joins/leaves/group creation/deletion.

OK.  Thanks for the insight.

Andrew
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to