Hi,

since yesterday I noticed that Netpipe and sometimes IMB are hanging. As far 
as I saw both processes stuck in a receive. The wired thing is that if I run 
it in a debugger everything works fine.

Cheers,
  Sven 

On Tuesday 31 July 2007 23:47, Jeff Squyres wrote:
> I'm getting a pile of test failures when running with the openib and  
> tcp BTLs on the trunk.  Gleb is getting some failures, too, but his  
> seem to be different than mine.
> 
> Here's what I'm seeing from manual MTT runs on my SVN/development  
> install -- did you know that MTT could do that? :-)
> 
> +-------------+-------------------+------+------+----------+------+
> | Phase       | Section           | Pass | Fail | Time out | Skip |
> +-------------+-------------------+------+------+----------+------+
> | Test Run    | intel             | 442  | 0    | 26       | 0    |
> | Test Run    | ibm               | 173  | 3    | 1        | 3    |
> +-------------+-------------------+------+------+----------+------+
> 
> The tests that are failing are:
> 
> *** WARNING: Test: MPI_Recv_pack_c, np=16, variant=1: TIMED OUT (failed)
> *** WARNING: Test: MPI_Ssend_ator_c, np=16, variant=1: TIMED OUT  
> (failed)
> *** WARNING: Test: MPI_Irecv_pack_c, np=16, variant=1: TIMED OUT  
> (failed)
> *** WARNING: Test: MPI_Isend_ator_c, np=16, variant=1: TIMED OUT  
> (failed)
> *** WARNING: Test: MPI_Irsend_rtoa_c, np=16, variant=1: TIMED OUT  
> (failed)
> *** WARNING: Test: MPI_Ssend_rtoa_c, np=16, variant=1: TIMED OUT  
> (failed)
> *** WARNING: Test: MPI_Send_rtoa_c, np=16, variant=1: TIMED OUT (failed)
> *** WARNING: Test: MPI_Send_ator_c, np=16, variant=1: TIMED OUT (failed)
> *** WARNING: Test: MPI_Rsend_rtoa_c, np=16, variant=1: TIMED OUT  
> (failed)
> *** WARNING: Test: MPI_Reduce_loc_c, np=16, variant=1: TIMED OUT  
> (failed)
> *** WARNING: Test: MPI_Isend_ator2_c, np=16, variant=1: TIMED OUT  
> (failed)
> *** WARNING: Test: MPI_Issend_rtoa_c, np=16, variant=1: TIMED OUT  
> (failed)
> *** WARNING: Test: MPI_Isend_rtoa_c, np=16, variant=1: TIMED OUT  
> (failed)
> *** WARNING: Test: MPI_Send_ator2_c, np=16, variant=1: TIMED OUT  
> (failed)
> *** WARNING: Test: MPI_Issend_ator_c, np=16, variant=1: TIMED OUT  
> (failed)
> *** WARNING: Test: comm_join, np=16, variant=1: TIMED OUT (failed)
> *** WARNING: Test: getcount, np=16, variant=1: FAILED
> *** WARNING: Test: spawn, np=3, variant=1: FAILED
> *** WARNING: Test: spawn_multiple, np=3, variant=1: FAILED
> 
> I'm not too worried about the comm spawn/join tests because I think  
> they're heavily oversubscribing the nodes and therefore timing out.   
> These were all from a default trunk build running with "mpirun --mca  
> btl openib,self".
> 
> For all of these tests, I'm running on 4 nodes, 4 cores each, but  
> they have varying numbers of network interfaces:
> 
>            nodes 1,2          nodes 3,4
> openib    3 active ports     2 active ports
> tcp       4 tcp interfaces   3 tcp interfaces
> 
> Is anyone else seeing these kinds of failures?
> 
> -- 
> Jeff Squyres
> Cisco Systems
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 

Reply via email to