Hi, since yesterday I noticed that Netpipe and sometimes IMB are hanging. As far as I saw both processes stuck in a receive. The wired thing is that if I run it in a debugger everything works fine.
Cheers, Sven On Tuesday 31 July 2007 23:47, Jeff Squyres wrote: > I'm getting a pile of test failures when running with the openib and > tcp BTLs on the trunk. Gleb is getting some failures, too, but his > seem to be different than mine. > > Here's what I'm seeing from manual MTT runs on my SVN/development > install -- did you know that MTT could do that? :-) > > +-------------+-------------------+------+------+----------+------+ > | Phase | Section | Pass | Fail | Time out | Skip | > +-------------+-------------------+------+------+----------+------+ > | Test Run | intel | 442 | 0 | 26 | 0 | > | Test Run | ibm | 173 | 3 | 1 | 3 | > +-------------+-------------------+------+------+----------+------+ > > The tests that are failing are: > > *** WARNING: Test: MPI_Recv_pack_c, np=16, variant=1: TIMED OUT (failed) > *** WARNING: Test: MPI_Ssend_ator_c, np=16, variant=1: TIMED OUT > (failed) > *** WARNING: Test: MPI_Irecv_pack_c, np=16, variant=1: TIMED OUT > (failed) > *** WARNING: Test: MPI_Isend_ator_c, np=16, variant=1: TIMED OUT > (failed) > *** WARNING: Test: MPI_Irsend_rtoa_c, np=16, variant=1: TIMED OUT > (failed) > *** WARNING: Test: MPI_Ssend_rtoa_c, np=16, variant=1: TIMED OUT > (failed) > *** WARNING: Test: MPI_Send_rtoa_c, np=16, variant=1: TIMED OUT (failed) > *** WARNING: Test: MPI_Send_ator_c, np=16, variant=1: TIMED OUT (failed) > *** WARNING: Test: MPI_Rsend_rtoa_c, np=16, variant=1: TIMED OUT > (failed) > *** WARNING: Test: MPI_Reduce_loc_c, np=16, variant=1: TIMED OUT > (failed) > *** WARNING: Test: MPI_Isend_ator2_c, np=16, variant=1: TIMED OUT > (failed) > *** WARNING: Test: MPI_Issend_rtoa_c, np=16, variant=1: TIMED OUT > (failed) > *** WARNING: Test: MPI_Isend_rtoa_c, np=16, variant=1: TIMED OUT > (failed) > *** WARNING: Test: MPI_Send_ator2_c, np=16, variant=1: TIMED OUT > (failed) > *** WARNING: Test: MPI_Issend_ator_c, np=16, variant=1: TIMED OUT > (failed) > *** WARNING: Test: comm_join, np=16, variant=1: TIMED OUT (failed) > *** WARNING: Test: getcount, np=16, variant=1: FAILED > *** WARNING: Test: spawn, np=3, variant=1: FAILED > *** WARNING: Test: spawn_multiple, np=3, variant=1: FAILED > > I'm not too worried about the comm spawn/join tests because I think > they're heavily oversubscribing the nodes and therefore timing out. > These were all from a default trunk build running with "mpirun --mca > btl openib,self". > > For all of these tests, I'm running on 4 nodes, 4 cores each, but > they have varying numbers of network interfaces: > > nodes 1,2 nodes 3,4 > openib 3 active ports 2 active ports > tcp 4 tcp interfaces 3 tcp interfaces > > Is anyone else seeing these kinds of failures? > > -- > Jeff Squyres > Cisco Systems > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel >