I'm getting a pile of test failures when running with the openib and
tcp BTLs on the trunk. Gleb is getting some failures, too, but his
seem to be different than mine.
Here's what I'm seeing from manual MTT runs on my SVN/development
install -- did you know that MTT could do that? :-)
+-------------+-------------------+------+------+----------+------+
| Phase | Section | Pass | Fail | Time out | Skip |
+-------------+-------------------+------+------+----------+------+
| Test Run | intel | 442 | 0 | 26 | 0 |
| Test Run | ibm | 173 | 3 | 1 | 3 |
+-------------+-------------------+------+------+----------+------+
The tests that are failing are:
*** WARNING: Test: MPI_Recv_pack_c, np=16, variant=1: TIMED OUT (failed)
*** WARNING: Test: MPI_Ssend_ator_c, np=16, variant=1: TIMED OUT
(failed)
*** WARNING: Test: MPI_Irecv_pack_c, np=16, variant=1: TIMED OUT
(failed)
*** WARNING: Test: MPI_Isend_ator_c, np=16, variant=1: TIMED OUT
(failed)
*** WARNING: Test: MPI_Irsend_rtoa_c, np=16, variant=1: TIMED OUT
(failed)
*** WARNING: Test: MPI_Ssend_rtoa_c, np=16, variant=1: TIMED OUT
(failed)
*** WARNING: Test: MPI_Send_rtoa_c, np=16, variant=1: TIMED OUT (failed)
*** WARNING: Test: MPI_Send_ator_c, np=16, variant=1: TIMED OUT (failed)
*** WARNING: Test: MPI_Rsend_rtoa_c, np=16, variant=1: TIMED OUT
(failed)
*** WARNING: Test: MPI_Reduce_loc_c, np=16, variant=1: TIMED OUT
(failed)
*** WARNING: Test: MPI_Isend_ator2_c, np=16, variant=1: TIMED OUT
(failed)
*** WARNING: Test: MPI_Issend_rtoa_c, np=16, variant=1: TIMED OUT
(failed)
*** WARNING: Test: MPI_Isend_rtoa_c, np=16, variant=1: TIMED OUT
(failed)
*** WARNING: Test: MPI_Send_ator2_c, np=16, variant=1: TIMED OUT
(failed)
*** WARNING: Test: MPI_Issend_ator_c, np=16, variant=1: TIMED OUT
(failed)
*** WARNING: Test: comm_join, np=16, variant=1: TIMED OUT (failed)
*** WARNING: Test: getcount, np=16, variant=1: FAILED
*** WARNING: Test: spawn, np=3, variant=1: FAILED
*** WARNING: Test: spawn_multiple, np=3, variant=1: FAILED
I'm not too worried about the comm spawn/join tests because I think
they're heavily oversubscribing the nodes and therefore timing out.
These were all from a default trunk build running with "mpirun --mca
btl openib,self".
For all of these tests, I'm running on 4 nodes, 4 cores each, but
they have varying numbers of network interfaces:
nodes 1,2 nodes 3,4
openib 3 active ports 2 active ports
tcp 4 tcp interfaces 3 tcp interfaces
Is anyone else seeing these kinds of failures?
--
Jeff Squyres
Cisco Systems