Thanks, george.
On Apr 27, 2008, at 12:33 PM, Gleb Natapov wrote:
On Sun, Apr 27, 2008 at 07:00:57PM +0300, Lenny Verkhovsky wrote:The situation is believable, but commit r18274, that adds this output, isHi, all I faced the "Unbelievable situation"not, as it doesn't take into account sequence number wrap around.during running IMB benchmark. /home/USERS/lenny/OMPI_ORTE_LMC/bin/mpirun -np 96 --bynode -hostfile hostfile_ompi -mca btl_openib_max_lmc 1 ./IMB-MPI1 PingPong PingPing Sendrecv Exchange Allreduce Reduce Reduce_scatter Bcast Barrier #---------------------------------------------------------------- # Benchmarking Allreduce # #processes = 96 #---------------------------------------------------------------- #Benchmarking #procs #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Allreduce 96 0 1000 0.02 0.03 0.02 Allreduce 96 4 1000 297.88 298.07 297.95 Allreduce 96 8 1000 296.15 296.32 296.24 Allreduce 96 16 1000 297.99 298.17 298.09 Allreduce 96 32 1000 296.97 297.20 297.04 Allreduce 96 64 1000 298.43 298.64 298.49 Allreduce 96 128 1000 296.86 297.07 296.93 Allreduce 96 256 1000 298.00 298.30 298.09 Allreduce 96 512 1000 296.79 296.96 296.85 Allreduce 96 1024 1000 299.23 299.39 299.31 Allreduce 96 2048 1000 295.51 295.64 295.57 Allreduce 96 4096 1000 246.02 246.13 246.08 Allreduce 96 8192 1000 492.52 492.74 492.63 Allreduce 96 16384 1000 5380.59 5381.47 5381.10 Allreduce 96 32768 1000 5372.86 5373.69 5373.36 Allreduce 96 65536 640 5470.41 5471.88 5471.16 Allreduce 96 131072 320 5554.52 5556.82 5555.75[witch24:15639] Unbelievable situation ... we got a duplicated fragmentwith seq number of 0 (expected 65534) from witch23[witch24:15639] Unbelievable situation ... we got a duplicated fragmentwith seq number of 65116 (expected 65534) from witch23 [witch24:15639] *** Process received signal *** [witch24:15639] Signal: Segmentation fault (11) [witch24:15639] Signal code: Address not mapped (1) [witch24:15639] Failing at address: 0x632457d0 [witch24:15639] [ 0] /lib64/libpthread.so.0 [0x2b7929a9bc10] [witch24:15639] [ 1] /home/USERS/lenny/OMPI_ORTE_LMC/lib/openmpi/mca_allocator_bucket.so [0x2b792aa47d34] [witch24:15639] [ 2] /home/USERS/lenny/OMPI_ORTE_LMC/lib/openmpi/mca_pml_ob1.so [0x2b792b172163] [witch24:15639] [ 3] /home/USERS/lenny/OMPI_ORTE_LMC/lib/openmpi/mca_btl_openib.so [0x2b792b6b0772] [witch24:15639] [ 4] /home/USERS/lenny/OMPI_ORTE_LMC/lib/openmpi/mca_btl_openib.so [0x2b792b6b15ff] [witch24:15639] [ 5] /home/USERS/lenny/OMPI_ORTE_LMC/lib/openmpi/mca_bml_r2.so [0x2b792b38307f] [witch24:15639] [ 6]/home/USERS/lenny/OMPI_ORTE_LMC/lib/libopen-pal.so.0(opal_progress +0x4a)[0x2b79294cd16a] [witch24:15639] [ 7] /home/USERS/lenny/OMPI_ORTE_LMC/lib/libmpi.so.0 [0x2b79292163a8] [witch24:15639] [ 8] /home/USERS/lenny/OMPI_ORTE_LMC/lib/openmpi/mca_coll_tuned.so [0x2b792c077cb7] [witch24:15639] [ 9] /home/USERS/lenny/OMPI_ORTE_LMC/lib/openmpi/mca_coll_tuned.so [0x2b792c07b296] [witch24:15639] [10] /home/USERS/lenny/OMPI_ORTE_LMC/lib/libmpi.so.0(PMPI_Allreduce+0x1e7) [0x2b7929229907] [witch24:15639] [11] ./IMB-MPI1(IMB_allreduce+0x8e) [0x40764e] [witch24:15639] [12] ./IMB-MPI1(main+0x3aa) [0x4034ea] [witch24:15639] [13] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2b7929bc2154] [witch24:15639] [14] ./IMB-MPI1 [0x4030a9] [witch24:15639] *** End of error message *** ------------------------------------------------------------------------ -- Best Regards, Lenny._______________________________________________ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel-- Gleb. _______________________________________________ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
smime.p7s
Description: S/MIME cryptographic signature