Some updates on this problem.
The code I'm using to test/produce this behavior is an MPI program. MPI
is used for convenience of job startup and collection of results. The
actual test/benchmark is using straight RDMA CM & ibverbs. What I'm
doing is timing how long it takes to join and bring up a multicast group
with varying number of processes and existing groups. One rank joins
with a '0' address to get a real address, MPI_Bcast's that address to
the other ranks, which then join the group. Meanwhile the root rank is
repeatedly sending a small ping message to the group. Every other rank
times from when they call rdma_join_multicast() to the join event
arrival, and to when they first receive a message on that group. Once
completed, the process repeats N times, leaving all the groups joined.
I'm now running OFED v1.2, and behavior has not changed due to this,
though I've noticed some other cases. First -- If I have not been using
anything multicast on the network for a while, I'm able to join a total
of 4 groups with my benchmark. After this, running it any number of
times, I can join 14 groups as described below.
Now the more interesting part. I'm now able to run on a 128 node
machine using open SM running on a node (before, I was running on an 8
node machine which I'm told is running the Cisco SM on a Topspin
switch). On this machine, if I run my benchmark with two processes per
node (instead of one, i.e. mpirun -np 16 with 8 nodes), I'm able to join
> 750 groups simultaneously from one QP on each process. To make this
stranger, I can join only 4 groups running the same thing on the 8-node
machine.
While doing so I noticed that the time from calling
rdma_join_multicast() to the event arrival stayed fairly constant (in
the .001sec range), while the time from the join call to actually
receiving messages on the group steadily increased from around .1 secs
to around 2.7 secs with 750+ groups. Furthermore, this time does not
drop back to .1 secs if I stop the benchmark and run it (or any of my
other multicast code) again. This is understandable within a single
program run, but the fact that behavior persists across runs concerns me
-- feels like a bug, but I don't have much concrete here.
Sorry for the long email -- I'm trying to provide as much detail as
possible so this can get fixed. I'm really not sure where to start
looking on my own, so even some hints on where the problem(s) might lie
would be useful.
Andrew
Andrew Friedley wrote:
I've run into a problem where it appears that I cannot join more than 14
multicast groups from a single HCA. I'm using the RDMA CM UD/multicast
interface from an OFED v1.2 nightly build, and using a '0' address when
joining to have the SM allocate an unused address. The first 14
rdma_join_multicast() calls succeed, a MULTICAST_JOIN event comes
through for each of them and everything works. But the 15th call to
rdma_join_multicast() returns -1 and sets errno to 99, 'Cannot assign
requested address'.
Note that I'm using a single QP per process to do all the joins. Things
get weirder if I run two instances of my program on the same node -- as
soon the total between the two instances is 14, neither instance can
join any more groups. Also, right now my code hangs when this happens
-- if I kill off one of the two instances and run a third instance
(while leaving the other hung, holding some number of groups), the third
instance is not able to join ANY groups. The behavior resets when I
kill all instances.
Two instances running on separate nodes (on the same network) do not
appear to interfere with each other like described above; they do still
error out on the 15th join.
This feels like a bug to me; though regardless this limit is WAY too
low. Any ideas what might be going on, or how I can work around it?
Andrew
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general