On Aug 14, 2007, at 2:12 PM, Andrew Friedley wrote:

An MPI is needed to compile/run the test. No arguments are needed; the test repeatedly joins groups (without leaving them) until an error occurs, then intentionally hangs.

Just to clarify for those not familiar with MPI -- MPI is not used in the multicast portion of the test. It's only used to bootstrap / launch the test and used as an "out of band" messaging mechanism so that you can know when the group has been joined, etc.

So you can even use a TCP-only MPI to run this test to ensure that you are not skewing any IB stack issues.

Here's some of the different behaviors I see with this test (OFED v1.2 is always used):

FWIW, I've been trying to help Andrew run this test, and I always run into one of two errors:

- Running 1 proc each on 2 nodes joins 2 groups, then:

0 ERROR rdma_join_multicast(): 99 Cannot assign requested address

- Running 4 procs on 1 node joins a few 10s of groups (different every time) and then:

ERROR event 13, status -110 Operation now in progress, forcing job to hang

The nodes are all RHEL4U4 running OFED 1.2; each node has 4 cores.

--
Jeff Squyres
Cisco Systems

_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to