On Aug 14, 2007, at 2:12 PM, Andrew Friedley wrote:
An MPI is needed to compile/run the test. No arguments are needed;
the test repeatedly joins groups (without leaving them) until an
error occurs, then intentionally hangs.
Just to clarify for those not familiar with MPI -- MPI is not used in
the multicast portion of the test. It's only used to bootstrap /
launch the test and used as an "out of band" messaging mechanism so
that you can know when the group has been joined, etc.
So you can even use a TCP-only MPI to run this test to ensure that
you are not skewing any IB stack issues.
Here's some of the different behaviors I see with this test (OFED
v1.2 is always used):
FWIW, I've been trying to help Andrew run this test, and I always run
into one of two errors:
- Running 1 proc each on 2 nodes joins 2 groups, then:
0 ERROR rdma_join_multicast(): 99 Cannot assign requested address
- Running 4 procs on 1 node joins a few 10s of groups (different
every time) and then:
ERROR event 13, status -110 Operation now in progress, forcing job to
hang
The nodes are all RHEL4U4 running OFED 1.2; each node has 4 cores.
--
Jeff Squyres
Cisco Systems
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general