I've tried the new rc. Here is what I got: 1) I've successfully built it with intel-13.1 and gcc-4.7.2. But I've failed while using open64-4.5.2 and ekopath-5.0.0 (pathscale). The problems are in the fortran part. In each case I've used the following configuration line: CC=$CC CXX=$CXX F77=$F77 FC=$FC ./configure --prefix=$prefix --with-knem=$knem_path Open64 failed during configuration with the following: *** Fortran compiler checking whether we are using the GNU Fortran compiler... yes checking whether openf95 accepts -g... yes configure: WARNING: Open MPI now ignores the F77 and FFLAGS environment variables; only the FC and FCFLAGS environment variables are used. checking whether ln -s works... yes checking if Fortran compiler works... yes checking for extra arguments to build a shared library... none needed checking for Fortran flag to compile .f files... none checking for Fortran flag to compile .f90 files... none checking to see if Fortran compilers need additional linker flags... none checking external symbol convention... double underscore checking if C and Fortran are link compatible... yes checking to see if Fortran compiler likes the C++ exception flags... skipped (no C++ exceptions flags) checking to see if mpifort compiler needs additional linker flags... none checking if Fortran compiler supports CHARACTER... yes checking size of Fortran CHARACTER... 1 checking for C type corresponding to CHARACTER... char checking alignment of Fortran CHARACTER... 1 checking for corresponding KIND value of CHARACTER... C_SIGNED_CHAR checking KIND value of Fortran C_SIGNED_CHAR... no ISO_C_BINDING -- fallback checking Fortran value of selected_int_kind(4)... no configure: WARNING: Could not determine KIND value of C_SIGNED_CHAR configure: WARNING: See config.log for more details configure: error: Cannot continue
Ekopath failed during make with the following error: PPFC mpi-f08-sizeof.lo PPFC mpi-f08.lo In file included from mpi-f08.F90:37: mpi-f-interfaces-bind.h:1908: warning: extra tokens at end of #endif directive mpi-f-interfaces-bind.h:2957: warning: extra tokens at end of #endif directive In file included from mpi-f08.F90:38: pmpi-f-interfaces-bind.h:1911: warning: extra tokens at end of #endif directive pmpi-f-interfaces-bind.h:2963: warning: extra tokens at end of #endif directive pathf95-1044 pathf95: INTERNAL OMPI_OP_CREATE_F, File = mpi-f-interfaces-bind.h, Line = 955, Column = 29 Internal : Unexpected ATP_PGM_UNIT in check_interoperable_pgm_unit() make[2]: *** [mpi-f08.lo] Error 1 make[2]: Leaving directory `/tmp/mpi_install_tmp1400/openmpi-1.7rc8/ompi/mpi/fortran/use-mpi-f08' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/tmp/mpi_install_tmp1400/openmpi-1.7rc8/ompi' make: *** [all-recursive] Error 1 It seems to be different from the error I got last time with rc7. And again I'm not a fortran guy to understand this error. I've used the following version of the compiler: http://c591116.r16.cf2.rackcdn.com/ekopath/nightly/Linux/ekopath-2013-02-26-installer.run 2) I've ran a couple of tests (IMB) with the new version. I ran this on a system consisting of 10 nodes with Intel SB processor and fdr ConnectX3 infiniband adapters. First I've tried the following parameters: mpirun -np $NP -hostfile hosts --mca btl openib,sm,self --bind-to-core -npernode 16 --mca mpi_leave_pinned 1 ./IMB-MPI1 -npmin $NP -mem 4G $COLL This combination complained about mca_leave_pinned. The same line works for 1.6.3. Is something different in the new release and I've missed it? -------------------------------------------------------------------------- A process attempted to use the "leave pinned" MPI feature, but no memory registration hooks were found on the system at run time. This may be the result of running on a system that does not support memory hooks or having some other software subvert Open MPI's use of the memory hooks. You can disable Open MPI's use of memory hooks by setting both the mpi_leave_pinned and mpi_leave_pinned_pipeline MCA parameters to 0. Open MPI will disable any transports that are attempting to use the leave pinned functionality; your job may still run, but may fall back to a slower network transport (such as TCP). Mpool name: grdma Process: [[13305,1],1] Local host: b23 -------------------------------------------------------------------------- -------------------------------------------------------------------------- WARNING: There is at least one OpenFabrics device found but there are no active ports detected (or Open MPI was unable to use them). This is most certainly not what you wanted. Check your cables, subnet manager configuration, etc. The openib BTL will be ignored for this job. Local host: b23 -------------------------------------------------------------------------- -------------------------------------------------------------------------- At least one pair of MPI processes are unable to reach each other for MPI communications. This means that no Open MPI device has indicated that it can be used to communicate between these processes. This is an error; Open MPI requires that all MPI processes be able to reach each other. This error can sometimes be the result of forgetting to specify the "self" BTL. Process 1 ([[13305,1],0]) is on host: b22 Process 2 ([[13305,1],1]) is on host: b23 BTLs attempted: self sm Your MPI job is now going to abort; sorry. ... Then I ran a couple of P2P and collective tests. In general the performance improved compared to 1.6.3. But there are several cases where it got worse. Perhaps I need to use some tuning, could you please tell me what parameters would suite me better then the default. Here is what I got for PingPong and PingPing in 1.7rc8 (the above parameters changed to have "-npernode 1"): #--------------------------------------------------- # Benchmarking PingPong # #processes = 2 #--------------------------------------------------- #bytes #repetitions t[usec] Mbytes/sec 0 1000 1.39 0.00 1 1000 1.50 0.64 2 1000 1.10 1.73 4 1000 1.10 3.46 8 1000 1.12 6.80 16 1000 1.12 13.62 32 1000 1.14 26.75 64 1000 1.18 51.92 128 1000 1.73 70.42 256 1000 1.85 132.04 512 1000 1.98 247.16 1024 1000 2.26 431.52 2048 1000 2.85 684.58 4096 1000 3.49 1118.63 8192 1000 4.48 1741.96 16384 1000 9.58 1630.92 32768 1000 14.27 2189.46 65536 640 23.03 2713.71 131072 320 35.55 3515.73 262144 160 57.65 4336.77 524288 80 101.42 4930.05 1048576 40 188.00 5319.18 2097152 20 521.70 3833.61 4194304 10 1118.20 3577.19 #--------------------------------------------------- # Benchmarking PingPing # #processes = 2 #--------------------------------------------------- #bytes #repetitions t[usec] Mbytes/sec 0 1000 1.26 0.00 1 1000 1.32 0.72 2 1000 1.32 1.44 4 1000 1.35 2.84 8 1000 1.38 5.53 16 1000 1.13 13.51 32 1000 1.13 26.96 64 1000 1.17 51.95 128 1000 1.72 70.96 256 1000 1.80 135.63 512 1000 1.94 251.17 1024 1000 2.23 437.51 2048 1000 2.88 677.47 4096 1000 3.49 1119.28 8192 1000 4.75 1643.41 16384 1000 9.90 1578.12 32768 1000 14.54 2149.25 65536 640 24.04 2599.79 131072 320 37.00 3378.35 262144 160 60.25 4149.39 524288 80 105.74 4728.77 1048576 40 196.73 5083.23 2097152 20 785.79 2545.20 4194304 10 1790.19 2234.40 And 1.6.3 gave the following: #--------------------------------------------------- # Benchmarking PingPong # #processes = 2 #--------------------------------------------------- #bytes #repetitions t[usec] Mbytes/sec 0 1000 1.06 0.00 1 1000 0.94 1.01 2 1000 0.95 2.02 4 1000 0.95 4.01 8 1000 0.97 7.90 16 1000 0.98 15.63 32 1000 0.99 30.86 64 1000 1.02 59.60 128 1000 1.58 77.23 256 1000 1.71 142.73 512 1000 1.86 263.15 1024 1000 2.13 459.35 2048 1000 2.72 718.31 4096 1000 3.27 1194.74 8192 1000 4.33 1802.57 16384 1000 6.20 2521.78 32768 1000 8.84 3535.46 65536 640 14.28 4376.82 131072 320 24.97 5005.06 262144 160 44.94 5562.46 524288 80 86.76 5763.29 1048576 40 168.73 5926.77 2097152 20 333.65 5994.32 4194304 10 666.09 6005.16 #--------------------------------------------------- # Benchmarking PingPing # #processes = 2 #--------------------------------------------------- #bytes #repetitions t[usec] Mbytes/sec 0 1000 0.93 0.00 1 1000 0.97 0.98 2 1000 0.97 1.97 4 1000 0.97 3.94 8 1000 0.99 7.70 16 1000 0.99 15.34 32 1000 1.01 30.21 64 1000 1.05 58.13 128 1000 1.61 75.82 256 1000 1.73 141.20 512 1000 1.88 259.87 1024 1000 2.17 450.21 2048 1000 2.83 691.13 4096 1000 3.45 1131.26 8192 1000 4.76 1639.88 16384 1000 7.76 2014.01 32768 1000 10.34 3021.35 65536 640 16.29 3836.55 131072 320 26.72 4678.40 262144 160 48.83 5120.31 524288 80 91.85 5443.61 1048576 40 178.65 5597.63 2097152 20 351.31 5692.98 4194304 10 701.69 5700.53 The sendrecv and exchange also got worse. I can send additional data if needed. The performance on collectives generally has slightly improved comparing to 1.6.3 or remained the same. But in certain cases I got much better results with tuned_collectives. In particular those suited my system better: --mca coll_tuned_barrier_algorithm 6 (default and tuned): #--------------------------------------------------- # Benchmarking Barrier # #processes = 160 #--------------------------------------------------- #repetitions t_min[usec] t_max[usec] t_avg[usec] 1000 49.75 49.77 49.76 #--------------------------------------------------- # Benchmarking Barrier # #processes = 160 #--------------------------------------------------- #repetitions t_min[usec] t_max[usec] t_avg[usec] 1000 12.74 12.74 12.74 Bcast for small messages --mca coll_tuned_bcast_algorithm 3 (default and tuned): #---------------------------------------------------------------- # Benchmarking Bcast # #processes = 160 #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.01 0.02 0.02 1 1000 9.87 9.96 9.92 2 1000 10.44 10.51 10.47 4 1000 10.30 10.37 10.34 8 1000 10.34 10.43 10.38 16 1000 10.39 10.48 10.43 32 1000 10.36 10.43 10.40 64 1000 10.38 10.44 10.41 128 1000 10.11 10.22 10.17 256 1000 11.37 11.54 11.48 512 1000 14.09 14.25 14.19 1024 1000 18.77 19.03 18.94 2048 1000 13.47 13.63 13.58 4096 1000 25.39 25.60 25.55 8192 1000 50.80 51.11 51.04 16384 1000 102.64 103.53 103.38 32768 1000 280.86 281.80 281.62 65536 640 387.10 391.90 391.26 131072 320 779.58 796.04 794.30 262144 160 1526.52 1597.39 1590.31 524288 80 355.67 379.06 375.27 1048576 40 702.95 753.65 736.29 2097152 20 1518.11 1580.85 1551.57 4194304 10 3183.22 3931.81 3676.94 #---------------------------------------------------------------- # Benchmarking Bcast # #processes = 160 #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.01 0.02 0.02 1 1000 4.54 5.13 4.85 2 1000 4.50 5.11 4.81 4 1000 4.50 5.09 4.80 8 1000 4.48 5.09 4.79 16 1000 4.49 5.09 4.79 32 1000 4.55 5.15 4.86 64 1000 4.52 5.14 4.83 128 1000 4.66 5.28 4.98 256 1000 4.78 5.40 5.09 512 1000 4.89 5.52 5.21 1024 1000 5.15 5.81 5.48 2048 1000 5.60 6.30 5.94 4096 1000 8.25 8.67 8.46 8192 1000 10.49 11.01 10.76 16384 1000 20.05 20.87 20.50 32768 1000 30.11 31.41 30.80 65536 640 46.08 48.94 47.54 131072 320 75.53 84.98 80.26 262144 160 134.26 169.44 151.92 524288 80 240.34 372.76 307.80 1048576 40 427.00 951.02 699.41 2097152 20 933.41 3170.45 2076.21 4194304 10 2682.40 16020.39 9718.86 and AllGatherv: --mca coll_tuned_allgatherv_algorithm 5 (default and tuned): #---------------------------------------------------------------- # Benchmarking Allgatherv # #processes = 160 #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.06 0.07 0.06 1 1000 54.11 54.15 54.13 2 1000 52.74 52.78 52.76 4 1000 55.09 55.13 55.11 8 1000 58.48 58.52 58.50 16 1000 61.99 62.03 62.01 32 1000 69.31 69.35 69.32 64 1000 88.13 88.18 88.16 128 1000 126.62 126.71 126.68 256 1000 215.26 215.34 215.31 512 1000 832.54 833.01 832.57 1024 1000 928.81 929.31 928.86 2048 1000 1072.77 1073.35 1072.85 4096 1000 1222.82 1223.42 1222.90 8192 1000 1713.46 1714.13 1713.87 16384 1000 2596.87 2598.31 2597.40 32768 1000 4153.70 4154.09 4153.92 65536 640 6795.04 6796.32 6795.83 131072 320 12076.74 12083.04 12080.28 262144 160 23120.98 23153.76 23138.10 524288 80 49077.99 49204.79 49142.48 1048576 40 132120.25 132675.60 132400.38 2097152 20 240537.20 241821.05 241138.53 4194304 10 457125.71 459065.10 458035.03 #---------------------------------------------------------------- # Benchmarking Allgatherv # #processes = 160 #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.06 0.07 0.06 1 1000 0.47 0.56 0.52 2 1000 0.47 0.57 0.51 4 1000 0.48 0.56 0.52 8 1000 0.46 0.56 0.51 16 1000 0.47 0.57 0.52 32 1000 0.47 0.56 0.52 64 1000 0.47 0.57 0.52 128 1000 0.50 0.62 0.57 256 1000 0.58 0.68 0.63 512 1000 0.62 0.81 0.70 1024 1000 0.71 0.97 0.80 2048 1000 0.89 1.24 1.05 4096 1000 2.21 2.58 2.40 8192 1000 3.08 3.55 3.30 16384 1000 4.77 5.56 5.11 32768 1000 7.99 9.75 8.90 65536 640 15.81 19.35 17.69 131072 320 34.18 39.74 36.95 262144 160 71.72 80.37 76.06 524288 80 143.64 161.81 152.36 1048576 40 781.10 868.80 825.57 2097152 20 2594.30 2795.45 2672.58 4194304 10 5185.79 5451.20 5298.98 This time I only ran the test on 160 processes but before I've done more testing with 1.6 on different number of processes (from 16 to 320) and those tuned parameters helped almost each time. I don't know what are default parameters tuned for but perhaps it may be a good idea to change the defaults for the kind of system I use. I can perform some additional tests if necessary or give more information on the problems that I've came across. Regards, Pavel Mezentsev. 2013/2/27 Jeff Squyres (jsquyres) <jsquy...@cisco.com> > The goal is to release 1.7 (final) by the end of this week. New rc posted > with fairly small changes: > > http://www.open-mpi.org/software/ompi/v1.7/ > > - Fix wrong header file / compilation error in bcol > - Support MXM STREAM for isend and irecv > - Make sure "mpirun <dirname>" fails with $status!=0 > - Bunches of cygwin minor fixes > - Make sure the fortran compiler supports BIND(C) with LOGICAL for the F08 > bindings > - Fix --disable-mpi-io with the F08 bindings > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel >