Andrew Friedley wrote:
Hal Rosenstock wrote:
I'm not quite parsing what is the same with what is different in the
results
(and I presume the only variable is SM).
Yes; this is confusing, I'll try to summarize the various behaviors I'm
getting.
First, there are two machines. One has 8 nodes and runs a Topspin
switch with the Cisco SM on it. The other is 128 nodes and runs a
Mellanox switch with Open SM on a compute node. OFED v1.2 is used on
both. Below is how many groups I can join using my test program
(described elsewhere in the thread)
On the 8 node machine:
8 procs (one per node) -- 14 groups.
16 procs (two per node) -- 4 groups.
On the 128 node machine:
8 procs (one per node, 8 nodes used) -- 14 groups.
16 procs (two per node, 8 nodes used) -- unlimited? I stopped past 750.
Some peculiarities complicate this. On either machine, I've noticed
that if I haven't been doing anything using IB multicast in say a day
(haven't tried to figure out exactly how long), in any run scenario
listed above, I can join 4 groups. I do a couple runs where I hit
errors after 4 groups, and then I consistently get the group counts
above for the rest of the work day.
Second, in the cases in which I am able to join 14 groups, if I run my
test program twice simultaneously on the same nodes, I am able to join a
maximum of 14 groups total between the two running tests (as opposed to
14 per test run). Running the test twice simultaneously using a
disjoint set of nodes is not an issue.
So I sent that last email before I meant to :) Need to eat.. I've
managed to confuse my self a little here too -- it looks like changing
from the Cisco SM to the OpenSM did not change behavior on the 8 node
machine. At least, I'm still getting the same results above now that
it's back on the Cisco SM.
Also some newer results. I had a long run going on the 128 node machine
to see how many groups I really could join, and it just errored out
after joining 892 groups successfully. Specifically, I got an
RDMA_CM_EVENT_MULTICAST_ERROR event containing status -22 ('Unknown
error' according to sterror). errno is still cleared to 'Success'. I
don't have time go look at the code to see where this came from right
now, but does anyone know what it means?
Andrew
This makes me think the switch is involved, is this correct?
I doubt it. It is either end station, SM, or a combination of the two.
OK.
OK this makes sense, but I still don't see where all the time is going.
Should the fact that the switches haven't been reprogrammed since
leaving the groups really effect how long it takes to do a subsequent
join? I'm not convinced.
It takes time for the SM to recalculate the multicast tree. While
leaves can
be lazy, I forget whether joins are synchronous or not.
Is the algorithm for recalculating the tree documented at all? Or,
where is the code for it (assuming I have access)? I feel like I'm
missing something here that explains why it's so costly.
Andrew
Is this time being consumed by the switches when the are asked to
reprogram their tables (I assume some sort of routing table is used
internally)?
This is relatively quick compared to the policy for the SM rerouting of
multicast based on joins/leaves/group creation/deletion.
OK. Thanks for the insight.
Andrew
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general