On 08/08/16 18:01, Nathan Hjelm wrote:
On Aug 08, 2016, at 05:17 AM, Paul Kapinos <kapi...@itc.rwth-aachen.de> wrote:Dear Open MPI developers, there is already a thread about 'sm BTL performace of the openmpi-2.0.0' https://www.open-mpi.org/community/lists/devel/2016/07/19288.php and we also see 30% bandwidth loss, on communication *via InfiniBand*. And we also have a clue: the IB buffers seem not to be aligned in 2.0.0 - in contrast to previous series (from at least 1.8.x). That means, - if we use a simple wrapper wrapping 'malloc' to the 32-bit-aligned-variant, we get the full bandwidth using the same compiled binary; and - there is nothing to grep in 'ompi_info -all | grep memalign' in 2.0.0 while in 1.10.3 there are 'btl_openib_memalign' and 'btl_openib_memalign_threshold' parameters. => seem the whole 'IB buffer alignment' part vanished in /2.0.0 ? Could we get the aligned IB buffers in 2.x series back, please? It's about 30% of performance....Open MPI developers will discuss this next week. This support was removed when we moved away from using ptmalloc2 to detect free/munmap/etc. Hooking malloc in is non-portable and can lead to a number of issues.
In my opinion users should be handling the alignment of their own buffers
a) the issue is not in the application/user own buffers but about the internal OpenMPI buffers, AFAIK - we discussed the same about 4 years ago?
b) No chance to control the allocation from a Fortran code at all (unless nasty dirty hack with > #include <malloc.h> > void* malloc(size_t size){ > return memalign(32,size); > } feeded using LD_PRELOAD variable).
and if that is not possible can use their own hooks to provide an aligned malloc. That said, I don't see how aligning the buffers could possibly lead to a 30% performance increase
my error - it's not 30% but 'only' 15%, or 3059 GB/S vs. 2677GB/s.(the bigger value is also the value older OpenMPI version and Intel MPI offer out of the box)
Take any application able to measure the bandwidth between two nodes, e.g. Intel MPI benchmarkfor large messages. What is the typical message size? It might suggest that something else in Open MPI is not performing as it should. Can you provide a reproducer that shows the bandwidth drop.
https://software.intel.com/en-us/articles/intel-mpi-benchmarks(cf. output attached, we used IMB-MPI1 v3.2.3 but I think *any* application would measure the same, iff able to measure the bandwidth)
That would be great. Loosing 15% of bandwidth is a NoGo for us, effectively avoiding usage of Open MPI (v2.x) at all...If I can get a reproducer I may be able to improve the performance for 2.0.2 without re-adding the alignment code.
Best Paul P.S. btl_openib_get_alignment and btl_openib_put_alignment are by default '0' - setting they high did not change the behaviour...These are hardware constants used to indicate the alignment restrictions of the hardware. They can not be changed.-- Dipl.-Inform. Paul Kapinos - High Performance Computing, RWTH Aachen University, IT Center Seffenter Weg 23, D 52074 Aachen (Germany) Tel: +49 241/80-24915 _______________________________________________ devel mailing list devel@lists.open-mpi.org <mailto:devel@lists.open-mpi.org> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel_______________________________________________ devel mailing list devel@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
-- Dipl.-Inform. Paul Kapinos - High Performance Computing, RWTH Aachen University, IT Center Seffenter Weg 23, D 52074 Aachen (Germany) Tel: +49 241/80-24915
[linuxbmc0002.rz.RWTH-Aachen.DE:09306] MCW rank 0 is not bound (or bound to all available processors) [linuxbmc0003.rz.RWTH-Aachen.DE:01706] MCW rank 1 is not bound (or bound to all available processors) #--------------------------------------------------- # Intel (R) MPI Benchmark Suite V3.2.3, MPI-1 part #--------------------------------------------------- # Date : Tue Aug 9 17:53:55 2016 # Machine : x86_64 # System : Linux # Release : 3.10.0-327.22.2.el7.x86_64 # Version : #1 SMP Thu Jun 23 17:05:11 UTC 2016 # MPI Version : 3.1 # MPI Thread Environment: # New default behavior from Version 3.2 on: # the number of iterations per message size is cut down # dynamically when a certain run time (per message size sample) # is expected to be exceeded. Time limit is defined by variable # "SECS_PER_SAMPLE" (=> IMB_settings.h) # or through the flag => -time # Calling sequence was: # /rwthfs/rz/cluster/home/pk224850/SVN/mpifasttest/trunk/imb_3.2.3/src/IMB-MPI1 # Minimum message length in bytes: 0 # Maximum message length in bytes: 1073741824 # # MPI_Datatype : MPI_BYTE # MPI_Datatype for reductions : MPI_FLOAT # MPI_Op : MPI_SUM # # # List of Benchmarks to run: # PingPong # PingPing # Sendrecv # Exchange # Allreduce # Reduce # Reduce_scatter # Allgather # Allgatherv # Gather # Gatherv # Scatter # Scatterv # Alltoall # Alltoallv # Bcast # Barrier #--------------------------------------------------- # Benchmarking PingPong # #processes = 2 #--------------------------------------------------- #bytes #repetitions t[usec] Mbytes/sec 0 1000 1.46 0.00 1 1000 1.56 0.61 2 1000 1.53 1.25 4 1000 1.54 2.48 8 1000 1.56 4.88 16 1000 1.57 9.72 32 1000 1.64 18.63 64 1000 1.73 35.31 128 1000 2.85 42.82 256 1000 3.07 79.49 512 1000 3.43 142.27 1024 1000 4.06 240.31 2048 1000 5.33 366.29 4096 1000 6.49 601.60 8192 1000 9.15 853.38 16384 1000 11.19 1396.21 32768 1000 17.09 1828.33 65536 640 25.56 2445.35 131072 320 50.63 2468.67 262144 160 95.50 2617.86 524288 80 183.60 2723.31 1048576 40 360.95 2770.48 2097152 20 715.85 2793.89 4194304 10 1423.85 2809.28 8388608 5 3002.02 2664.87 16777216 2 5989.68 2671.26 33554432 1 11960.66 2675.44 67108864 1 23907.31 2677.01 134217728 1 47815.23 2676.97 268435456 1 95623.32 2677.17 536870912 1 191220.62 2677.54 1073741824 out-of-mem.; needed X= 2.001 GB; use flag "-mem X" or MAX_MEM_USAGE>=X (IMB_mem_info.h) #--------------------------------------------------- # Benchmarking PingPing # #processes = 2 #--------------------------------------------------- #bytes #repetitions t[usec] Mbytes/sec 0 1000 1.51 0.00 1 1000 1.66 0.58 2 1000 1.62 1.18 4 1000 1.64 2.33 8 1000 1.65 4.63 16 1000 1.67 9.16 32 1000 1.74 17.50 64 1000 1.91 31.92 128 1000 2.92 41.75 256 1000 3.20 76.29 512 1000 3.52 138.66 1024 1000 4.17 234.40 2048 1000 5.43 359.50 4096 1000 6.75 578.65 8192 1000 9.46 826.10 16384 1000 15.11 1034.04 32768 1000 22.58 1384.19 65536 640 32.81 1904.71 131072 320 55.88 2237.04 262144 160 101.87 2454.18 524288 80 193.25 2587.36 1048576 40 376.37 2656.94 2097152 20 743.13 2691.30 4194304 10 1480.27 2702.21 8388608 5 3264.24 2450.80 16777216 2 6573.31 2434.08 33554432 1 13161.32 2431.37 67108864 1 26357.10 2428.19 134217728 1 52784.61 2424.95 268435456 1 105606.24 2424.10 536870912 1 211129.92 2425.05 1073741824 out-of-mem.; needed X= 2.001 GB; use flag "-mem X" or MAX_MEM_USAGE>=X (IMB_mem_info.h) #----------------------------------------------------------------------------- # Benchmarking Sendrecv # #processes = 2 #----------------------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Mbytes/sec 0 1000 1.49 1.50 1.49 0.00 1 1000 1.56 1.56 1.56 1.22 2 1000 1.58 1.58 1.58 2.42 4 1000 1.60 1.60 1.60 4.76 8 1000 1.62 1.62 1.62 9.40 16 1000 1.61 1.61 1.61 18.97 32 1000 1.65 1.65 1.65 36.94 64 1000 1.78 1.78 1.78 68.39 128 1000 2.92 2.92 2.92 83.64 256 1000 3.14 3.14 3.14 155.33 512 1000 3.56 3.56 3.56 274.56 1024 1000 4.37 4.37 4.37 446.96 2048 1000 5.41 5.41 5.41 721.82 4096 1000 6.75 6.76 6.76 1155.71 8192 1000 9.65 9.65 9.65 1618.68 16384 1000 15.16 15.16 15.16 2061.67 32768 1000 22.81 22.81 22.81 2740.01 65536 640 33.41 33.42 33.41 3740.68 131072 320 56.69 56.69 56.69 4409.83 262144 160 105.80 105.81 105.80 4725.46 524288 80 192.69 192.72 192.71 5188.88 1048576 40 376.09 376.09 376.09 5317.84 2097152 20 741.54 741.56 741.55 5394.03 4194304 10 1478.53 1478.63 1478.58 5410.42 8388608 5 3259.58 3259.96 3259.77 4908.03 16777216 2 6532.78 6534.03 6533.40 4897.44 33554432 1 13048.08 13050.67 13049.38 4903.96 67108864 1 26074.54 26075.98 26075.26 4908.73 134217728 1 52095.53 52098.10 52096.81 4913.81 268435456 1 104177.01 104179.67 104178.34 4914.59 536870912 1 208321.93 208324.00 208322.96 4915.42 1073741824 out-of-mem.; needed X= 2.001 GB; use flag "-mem X" or MAX_MEM_USAGE>=X (IMB_mem_info.h) #----------------------------------------------------------------------------- # Benchmarking Exchange # #processes = 2 #----------------------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Mbytes/sec 0 1000 1.81 1.81 1.81 0.00 1 1000 1.92 1.92 1.92 1.99 2 1000 1.92 1.92 1.92 3.97 4 1000 1.93 1.93 1.93 7.89 8 1000 1.99 1.99 1.99 15.35 16 1000 1.96 1.97 1.96 31.06 32 1000 2.07 2.07 2.07 58.86 64 1000 2.23 2.23 2.23 109.71 128 1000 3.44 3.44 3.44 142.09 256 1000 3.83 3.83 3.83 254.65 512 1000 4.31 4.31 4.31 453.37 1024 1000 5.43 5.43 5.43 719.74 2048 1000 6.96 6.96 6.96 1122.40 4096 1000 9.06 9.07 9.07 1723.12 8192 1000 13.40 13.41 13.40 2330.75 16384 1000 26.77 26.77 26.77 2334.28 32768 1000 40.85 40.85 40.85 3060.05 65536 640 62.84 62.84 62.84 3978.24 131072 320 108.59 108.59 108.59 4604.44 262144 160 201.15 201.15 201.15 4971.50 524288 80 384.99 385.00 385.00 5194.81 1048576 40 750.84 750.85 750.85 5327.30 2097152 20 1481.14 1481.15 1481.15 5401.19 4194304 10 3260.73 3260.74 3260.73 4906.86 8388608 5 6547.81 6548.19 6548.00 4886.84 16777216 2 13002.46 13003.71 13003.08 4921.67 33554432 1 25863.56 25865.30 25864.43 4948.71 67108864 1 51678.95 51680.04 51679.50 4953.56 134217728 1 103395.91 103397.72 103396.81 4951.75 268435456 1 206690.26 206692.68 206691.47 4954.22 536870912 out-of-mem.; needed X= 1.501 GB; use flag "-mem X" or MAX_MEM_USAGE>=X (IMB_mem_info.h) 1073741824 out-of-mem.; needed X= 3.001 GB; use flag "-mem X" or MAX_MEM_USAGE>=X (IMB_mem_info.h) #---------------------------------------------------------------- # Benchmarking Allreduce # #processes = 2 #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.02 0.02 0.02 4 1000 1.74 1.74 1.74 8 1000 1.77 1.77 1.77 16 1000 1.82 1.82 1.82 32 1000 1.81 1.81 1.81 64 1000 1.99 1.99 1.99 128 1000 3.16 3.16 3.16 256 1000 3.47 3.47 3.47 512 1000 3.83 3.83 3.83 1024 1000 4.71 4.72 4.71 2048 1000 6.20 6.20 6.20 4096 1000 7.96 7.96 7.96 8192 1000 11.57 11.58 11.57 16384 1000 21.99 22.00 21.99 32768 1000 38.50 38.50 38.50 65536 640 65.07 65.07 65.07 131072 320 106.24 106.24 106.24 262144 160 190.71 190.73 190.72 524288 80 356.12 356.15 356.13 1048576 40 682.92 682.92 682.92 2097152 20 1340.88 1340.89 1340.89 4194304 10 3031.72 3031.92 3031.82 8388608 5 6420.89 6421.28 6421.08 16777216 2 13081.92 13083.39 13082.66 33554432 1 25780.64 25781.74 25781.19 67108864 1 52888.72 52888.96 52888.84 134217728 1 104606.46 104607.86 104607.16 268435456 1 208152.94 208155.70 208154.32 536870912 1 416709.14 416710.93 416710.04 1073741824 out-of-mem.; needed X= 2.001 GB; use flag "-mem X" or MAX_MEM_USAGE>=X (IMB_mem_info.h) #---------------------------------------------------------------- # Benchmarking Reduce # #processes = 2 #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.02 0.02 0.02 4 1000 1.61 1.61 1.61 8 1000 1.68 1.68 1.68 16 1000 1.71 1.71 1.71 32 1000 1.67 1.67 1.67 64 1000 1.84 1.84 1.84 128 1000 3.09 3.09 3.09 256 1000 3.28 3.29 3.29 512 1000 3.94 3.94 3.94 1024 1000 4.66 4.66 4.66 2048 1000 6.13 6.14 6.14 4096 1000 7.87 7.88 7.87 8192 1000 11.37 11.38 11.38 16384 1000 16.96 16.96 16.96 32768 1000 27.27 27.28 27.27 65536 640 54.31 54.32 54.31 131072 320 98.94 99.00 98.97 262144 160 195.14 195.25 195.20 524288 80 385.06 385.30 385.18 1048576 40 762.85 763.34 763.09 2097152 20 1513.56 1514.48 1514.02 4194304 10 3047.19 3048.93 3048.06 8388608 5 6420.43 6425.36 6422.90 16777216 2 14997.11 15008.44 15002.78 33554432 1 25774.24 25796.64 25785.44 67108864 1 51298.07 51321.68 51309.87 134217728 1 102658.00 102680.95 102669.48 268435456 1 300808.10 300833.65 300820.88 536870912 1 430889.15 430915.25 430902.20 1073741824 out-of-mem.; needed X= 2.001 GB; use flag "-mem X" or MAX_MEM_USAGE>=X (IMB_mem_info.h) #---------------------------------------------------------------- # Benchmarking Reduce_scatter # #processes = 2 #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.02 0.02 0.02 4 1000 0.60 0.62 0.61 8 1000 1.99 1.99 1.99 16 1000 2.07 2.07 2.07 32 1000 2.04 2.04 2.04 64 1000 2.07 2.08 2.08 128 1000 2.31 2.31 2.31 256 1000 3.52 3.52 3.52 512 1000 3.74 3.75 3.74 1024 1000 4.26 4.26 4.26 2048 1000 5.12 5.13 5.12 4096 1000 6.84 6.84 6.84 8192 1000 8.88 8.89 8.88 16384 1000 13.17 13.18 13.17 32768 1000 24.97 24.98 24.97 65536 640 41.82 41.83 41.82 131072 320 74.83 74.83 74.83 262144 160 139.89 139.95 139.92 524288 80 249.96 249.98 249.97 1048576 40 487.71 487.71 487.71 2097152 20 967.11 967.83 967.47 4194304 10 2087.58 2087.62 2087.60 8388608 5 4864.76 4867.89 4866.32 16777216 2 10562.90 10563.99 10563.44 33554432 1 25809.43 25942.92 25876.18 67108864 1 59388.52 59464.31 59426.42 134217728 1 117727.83 118096.61 117912.22 268435456 1 234241.70 234812.97 234527.33 536870912 1 469952.28 484283.08 477117.68 1073741824 out-of-mem.; needed X= 2.001 GB; use flag "-mem X" or MAX_MEM_USAGE>=X (IMB_mem_info.h) #---------------------------------------------------------------- # Benchmarking Allgather # #processes = 2 #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.02 0.02 0.02 1 1000 1.67 1.67 1.67 2 1000 1.71 1.71 1.71 4 1000 1.77 1.77 1.77 8 1000 1.71 1.71 1.71 16 1000 1.72 1.72 1.72 32 1000 1.80 1.80 1.80 64 1000 1.97 1.97 1.97 128 1000 3.03 3.03 3.03 256 1000 3.35 3.36 3.36 512 1000 3.67 3.68 3.68 1024 1000 4.34 4.34 4.34 2048 1000 5.57 5.57 5.57 4096 1000 6.91 6.91 6.91 8192 1000 9.92 9.93 9.92 16384 1000 16.69 16.69 16.69 32768 1000 25.09 25.09 25.09 65536 640 37.56 37.56 37.56 131072 320 64.77 64.77 64.77 262144 160 119.40 119.40 119.40 524288 80 231.18 231.22 231.20 1048576 40 454.77 454.81 454.79 2097152 20 900.76 900.88 900.82 4194304 10 1800.53 1800.56 1800.55 8388608 5 4840.09 4860.34 4850.22 16777216 2 9983.52 10195.53 10089.52 33554432 1 19573.69 19952.66 19763.18 67108864 1 39134.74 39789.32 39462.03 134217728 1 78053.50 79460.80 78757.15 268435456 1 156635.13 158519.52 157577.32 536870912 out-of-mem.; needed X= 1.501 GB; use flag "-mem X" or MAX_MEM_USAGE>=X (IMB_mem_info.h) 1073741824 out-of-mem.; needed X= 3.001 GB; use flag "-mem X" or MAX_MEM_USAGE>=X (IMB_mem_info.h) #---------------------------------------------------------------- # Benchmarking Allgatherv # #processes = 2 #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.02 0.02 0.02 1 1000 1.71 1.71 1.71 2 1000 1.70 1.70 1.70 4 1000 1.71 1.71 1.71 8 1000 1.74 1.74 1.74 16 1000 1.78 1.78 1.78 32 1000 1.84 1.85 1.85 64 1000 1.92 1.92 1.92 128 1000 3.06 3.07 3.07 256 1000 3.39 3.39 3.39 512 1000 3.72 3.72 3.72 1024 1000 4.38 4.38 4.38 2048 1000 5.61 5.61 5.61 4096 1000 6.96 6.96 6.96 8192 1000 10.22 10.23 10.22 16384 1000 16.53 16.53 16.53 32768 1000 24.61 24.61 24.61 65536 640 37.63 37.63 37.63 131072 320 65.32 65.33 65.32 262144 160 119.77 119.77 119.77 524288 80 232.26 232.28 232.27 1048576 40 455.06 455.14 455.10 2097152 20 902.00 902.22 902.11 4194304 10 1796.02 1796.92 1796.47 8388608 5 4804.13 4817.73 4810.93 16777216 2 9950.96 10123.75 10037.35 33554432 1 19608.91 19823.93 19716.42 67108864 1 39237.51 39603.85 39420.68 134217728 1 78239.89 79094.10 78666.99 268435456 1 157711.72 168665.39 163188.56 536870912 out-of-mem.; needed X= 1.501 GB; use flag "-mem X" or MAX_MEM_USAGE>=X (IMB_mem_info.h) 1073741824 int-overflow.; The production rank*size caused int overflow for given sample #---------------------------------------------------------------- # Benchmarking Gather # #processes = 2 #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.02 0.02 0.02 1 1000 1.58 1.58 1.58 2 1000 1.58 1.58 1.58 4 1000 1.64 1.64 1.64 8 1000 1.62 1.62 1.62 16 1000 1.67 1.67 1.67 32 1000 1.65 1.65 1.65 64 1000 1.84 1.84 1.84 128 1000 2.94 2.94 2.94 256 1000 3.20 3.20 3.20 512 1000 3.58 3.58 3.58 1024 1000 4.26 4.26 4.26 2048 1000 5.47 5.48 5.48 4096 1000 6.75 6.76 6.76 8192 1000 9.31 9.31 9.31 16384 1000 18.52 18.52 18.52 32768 1000 24.93 24.93 24.93 65536 640 38.18 38.18 38.18 131072 320 66.91 66.92 66.91 262144 160 120.62 120.63 120.62 524288 80 228.56 228.58 228.57 1048576 40 444.47 444.51 444.49 2097152 20 875.12 875.21 875.17 4194304 10 1748.89 1749.06 1748.98 8388608 5 4537.97 4538.07 4538.02 16777216 2 9408.75 9408.88 9408.81 33554432 1 18679.33 18680.00 18679.66 67108864 1 37195.04 37195.67 37195.35 134217728 1 74329.73 74330.59 74330.16 268435456 1 148173.14 148173.54 148173.34 536870912 out-of-mem.; needed X= 1.501 GB; use flag "-mem X" or MAX_MEM_USAGE>=X (IMB_mem_info.h) 1073741824 out-of-mem.; needed X= 3.001 GB; use flag "-mem X" or MAX_MEM_USAGE>=X (IMB_mem_info.h) #---------------------------------------------------------------- # Benchmarking Gatherv # #processes = 2 #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.03 0.03 0.03 1 1000 1.55 1.55 1.55 2 1000 1.55 1.55 1.55 4 1000 1.60 1.60 1.60 8 1000 1.58 1.58 1.58 16 1000 1.59 1.59 1.59 32 1000 1.61 1.61 1.61 64 1000 1.80 1.80 1.80 128 1000 2.91 2.91 2.91 256 1000 3.16 3.17 3.16 512 1000 3.56 3.56 3.56 1024 1000 4.19 4.19 4.19 2048 1000 5.39 5.40 5.39 4096 1000 6.74 6.74 6.74 8192 1000 10.07 10.08 10.07 16384 1000 12.68 12.68 12.68 32768 1000 18.58 18.58 18.58 65536 640 30.23 30.23 30.23 131072 320 53.96 53.98 53.97 262144 160 103.28 103.38 103.33 524288 80 201.70 202.13 201.91 1048576 40 399.29 401.13 400.21 2097152 20 791.00 798.57 794.78 4194304 10 1587.14 1617.87 1602.50 8388608 5 3945.77 3945.92 3945.84 16777216 2 7788.37 9397.15 8592.76 33554432 1 18773.78 18775.31 18774.54 67108864 1 37320.55 37321.36 37320.96 134217728 1 74511.61 74512.86 74512.24 268435456 1 148712.22 148713.21 148712.72 536870912 out-of-mem.; needed X= 1.501 GB; use flag "-mem X" or MAX_MEM_USAGE>=X (IMB_mem_info.h) 1073741824 int-overflow.; The production rank*size caused int overflow for given sample #---------------------------------------------------------------- # Benchmarking Scatter # #processes = 2 #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.02 0.02 0.02 1 1000 1.58 1.58 1.58 2 1000 1.58 1.58 1.58 4 1000 1.59 1.60 1.59 8 1000 1.62 1.62 1.62 16 1000 1.66 1.66 1.66 32 1000 1.69 1.69 1.69 64 1000 1.85 1.85 1.85 128 1000 2.89 2.90 2.89 256 1000 3.28 3.29 3.29 512 1000 3.57 3.58 3.58 1024 1000 4.23 4.23 4.23 2048 1000 5.43 5.44 5.43 4096 1000 6.69 6.70 6.70 8192 1000 9.45 9.45 9.45 16384 1000 12.89 12.89 12.89 32768 1000 19.12 19.12 19.12 65536 640 31.81 31.82 31.81 131072 320 57.84 57.88 57.86 262144 160 110.87 111.07 110.97 524288 80 217.04 217.87 217.46 1048576 40 429.50 432.92 431.21 2097152 20 854.20 868.20 861.20 4194304 10 1715.08 1771.33 1743.21 8388608 5 3909.74 3910.26 3910.00 16777216 2 7764.00 9328.95 8546.48 33554432 1 18857.17 18861.18 18859.18 67108864 1 37479.93 37483.65 37481.79 134217728 1 74866.45 74870.84 74868.65 268435456 1 149457.58 149460.70 149459.14 536870912 out-of-mem.; needed X= 1.501 GB; use flag "-mem X" or MAX_MEM_USAGE>=X (IMB_mem_info.h) 1073741824 out-of-mem.; needed X= 3.001 GB; use flag "-mem X" or MAX_MEM_USAGE>=X (IMB_mem_info.h) #---------------------------------------------------------------- # Benchmarking Scatterv # #processes = 2 #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.03 0.03 0.03 1 1000 1.57 1.57 1.57 2 1000 1.60 1.60 1.60 4 1000 1.60 1.60 1.60 8 1000 1.63 1.63 1.63 16 1000 1.69 1.69 1.69 32 1000 1.72 1.72 1.72 64 1000 1.83 1.83 1.83 128 1000 3.11 3.12 3.11 256 1000 3.30 3.31 3.30 512 1000 3.90 3.90 3.90 1024 1000 4.43 4.43 4.43 2048 1000 5.68 5.69 5.69 4096 1000 7.15 7.15 7.15 8192 1000 10.19 10.20 10.20 16384 1000 13.35 13.35 13.35 32768 1000 19.90 19.91 19.90 65536 640 33.31 33.32 33.31 131072 320 60.93 60.98 60.95 262144 160 115.83 116.04 115.94 524288 80 226.96 227.79 227.38 1048576 40 449.16 452.61 450.88 2097152 20 893.68 907.70 900.69 4194304 10 1791.28 1847.46 1819.37 8388608 5 4242.52 4243.09 4242.80 16777216 2 8412.80 9978.21 9195.51 33554432 1 21500.62 21505.39 21503.00 67108864 1 42928.61 42933.14 42930.88 134217728 1 85647.98 85651.30 85649.64 268435456 1 175052.35 175053.33 175052.84 536870912 out-of-mem.; needed X= 1.501 GB; use flag "-mem X" or MAX_MEM_USAGE>=X (IMB_mem_info.h) 1073741824 int-overflow.; The production rank*size caused int overflow for given sample #---------------------------------------------------------------- # Benchmarking Alltoall # #processes = 2 #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.02 0.02 0.02 1 1000 1.75 1.75 1.75 2 1000 1.74 1.74 1.74 4 1000 1.89 1.89 1.89 8 1000 1.77 1.77 1.77 16 1000 1.79 1.79 1.79 32 1000 1.85 1.85 1.85 64 1000 2.00 2.00 2.00 128 1000 3.09 3.09 3.09 256 1000 3.42 3.43 3.42 512 1000 3.67 3.67 3.67 1024 1000 4.45 4.45 4.45 2048 1000 5.67 5.68 5.67 4096 1000 7.05 7.05 7.05 8192 1000 10.17 10.18 10.18 16384 1000 16.73 16.73 16.73 32768 1000 23.93 23.93 23.93 65536 640 37.12 37.13 37.13 131072 320 63.51 63.51 63.51 262144 160 118.70 118.71 118.71 524288 80 230.05 230.05 230.05 1048576 40 453.39 453.50 453.44 2097152 20 896.89 896.93 896.91 4194304 10 1797.63 1798.11 1797.87 8388608 5 4831.06 4857.86 4844.46 16777216 2 9962.83 10200.71 10081.77 33554432 1 19717.01 20025.93 19871.47 67108864 1 39188.30 39865.49 39526.90 134217728 1 78466.05 79718.43 79092.24 268435456 1 156760.48 158618.05 157689.26 536870912 out-of-mem.; needed X= 2.001 GB; use flag "-mem X" or MAX_MEM_USAGE>=X (IMB_mem_info.h) 1073741824 out-of-mem.; needed X= 4.001 GB; use flag "-mem X" or MAX_MEM_USAGE>=X (IMB_mem_info.h) #---------------------------------------------------------------- # Benchmarking Alltoallv # #processes = 2 #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.06 0.06 0.06 1 1000 1.70 1.70 1.70 2 1000 1.70 1.70 1.70 4 1000 1.71 1.71 1.71 8 1000 1.72 1.73 1.73 16 1000 1.78 1.78 1.78 32 1000 1.76 1.77 1.77 64 1000 2.11 2.11 2.11 128 1000 3.21 3.21 3.21 256 1000 3.47 3.47 3.47 512 1000 3.86 3.86 3.86 1024 1000 4.72 4.72 4.72 2048 1000 6.04 6.05 6.04 4096 1000 7.64 7.64 7.64 8192 1000 11.10 11.10 11.10 16384 1000 16.62 16.62 16.62 32768 1000 25.57 25.57 25.57 65536 640 37.25 37.25 37.25 131072 320 64.55 64.56 64.55 262144 160 120.31 120.31 120.31 524288 80 232.68 232.68 232.68 1048576 40 458.34 458.36 458.35 2097152 20 906.92 906.93 906.93 4194304 10 1813.65 1813.70 1813.68 8388608 5 5433.02 5433.65 5433.34 16777216 2 11423.07 11423.71 11423.39 33554432 1 22488.74 22491.24 22489.99 67108864 1 44869.19 44872.25 44870.72 134217728 1 89713.75 89717.64 89715.70 268435456 1 159073.36 159075.73 159074.55 536870912 out-of-mem.; needed X= 2.001 GB; use flag "-mem X" or MAX_MEM_USAGE>=X (IMB_mem_info.h) 1073741824 int-overflow.; The production rank*size caused int overflow for given sample #---------------------------------------------------------------- # Benchmarking Bcast # #processes = 2 #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.02 0.02 0.02 1 1000 1.74 1.75 1.74 2 1000 1.74 1.74 1.74 4 1000 1.77 1.77 1.77 8 1000 1.78 1.78 1.78 16 1000 1.79 1.80 1.80 32 1000 1.82 1.82 1.82 64 1000 1.94 1.95 1.94 128 1000 3.04 3.05 3.04 256 1000 3.25 3.25 3.25 512 1000 3.65 3.65 3.65 1024 1000 4.29 4.30 4.29 2048 1000 5.42 5.42 5.42 4096 1000 7.04 7.04 7.04 8192 1000 10.08 10.09 10.08 16384 1000 15.18 15.19 15.18 32768 1000 27.86 27.86 27.86 65536 640 46.96 46.96 46.96 131072 320 80.56 80.56 80.56 262144 160 151.53 151.53 151.53 524288 80 206.37 206.37 206.37 1048576 40 412.97 412.97 412.97 2097152 20 825.55 825.56 825.56 4194304 10 1674.03 1674.04 1674.04 8388608 5 3487.58 3487.98 3487.78 16777216 2 7010.10 7010.13 7010.12 33554432 1 14021.40 14023.36 14022.38 67108864 1 28047.28 28049.85 28048.56 134217728 1 56017.18 56019.01 56018.09 268435456 1 112067.94 112070.31 112069.13 536870912 1 313946.36 313947.19 313946.77 1073741824 out-of-mem.; needed X= 2.001 GB; use flag "-mem X" or MAX_MEM_USAGE>=X (IMB_mem_info.h) #--------------------------------------------------- # Benchmarking Barrier # #processes = 2 #--------------------------------------------------- #repetitions t_min[usec] t_max[usec] t_avg[usec] 1000 1.55 1.55 1.55 # All processes entering MPI_Finalize
[linuxbmc0003.rz.RWTH-Aachen.DE:02291] MCW rank 1 is not bound (or bound to all available processors) [linuxbmc0002.rz.RWTH-Aachen.DE:09875] MCW rank 0 is not bound (or bound to all available processors) #--------------------------------------------------- # Intel (R) MPI Benchmark Suite V3.2.3, MPI-1 part #--------------------------------------------------- # Date : Tue Aug 9 17:55:17 2016 # Machine : x86_64 # System : Linux # Release : 3.10.0-327.22.2.el7.x86_64 # Version : #1 SMP Thu Jun 23 17:05:11 UTC 2016 # MPI Version : 3.1 # MPI Thread Environment: # New default behavior from Version 3.2 on: # the number of iterations per message size is cut down # dynamically when a certain run time (per message size sample) # is expected to be exceeded. Time limit is defined by variable # "SECS_PER_SAMPLE" (=> IMB_settings.h) # or through the flag => -time # Calling sequence was: # IMB-MPI1 # Minimum message length in bytes: 0 # Maximum message length in bytes: 1073741824 # # MPI_Datatype : MPI_BYTE # MPI_Datatype for reductions : MPI_FLOAT # MPI_Op : MPI_SUM # # # List of Benchmarks to run: # PingPong # PingPing # Sendrecv # Exchange # Allreduce # Reduce # Reduce_scatter # Allgather # Allgatherv # Gather # Gatherv # Scatter # Scatterv # Alltoall # Alltoallv # Bcast # Barrier #--------------------------------------------------- # Benchmarking PingPong # #processes = 2 #--------------------------------------------------- #bytes #repetitions t[usec] Mbytes/sec 0 1000 1.46 0.00 1 1000 1.54 0.62 2 1000 1.53 1.25 4 1000 1.53 2.49 8 1000 1.56 4.90 16 1000 1.57 9.72 32 1000 1.59 19.15 64 1000 1.75 34.85 128 1000 2.87 42.59 256 1000 3.10 78.84 512 1000 3.47 140.72 1024 1000 4.25 229.79 2048 1000 5.40 361.52 4096 1000 6.47 604.02 8192 1000 9.90 789.43 16384 1000 11.00 1420.24 32768 1000 16.21 1928.15 65536 640 25.42 2458.28 131072 320 49.64 2518.37 262144 160 83.02 3011.19 524288 80 160.29 3119.43 1048576 40 314.81 3176.49 2097152 20 624.03 3204.96 4194304 10 1245.44 3211.71 8388608 5 2628.47 3043.59 16777216 2 5251.32 3046.86 33554432 1 10487.20 3051.34 67108864 1 20965.70 3052.60 134217728 1 41922.21 3053.27 268435456 1 83828.53 3053.85 536870912 1 167347.48 3059.50 1073741824 out-of-mem.; needed X= 2.001 GB; use flag "-mem X" or MAX_MEM_USAGE>=X (IMB_mem_info.h) #--------------------------------------------------- # Benchmarking PingPing # #processes = 2 #--------------------------------------------------- #bytes #repetitions t[usec] Mbytes/sec 0 1000 1.52 0.00 1 1000 1.62 0.59 2 1000 1.68 1.14 4 1000 1.62 2.35 8 1000 1.64 4.64 16 1000 1.67 9.13 32 1000 1.70 17.90 64 1000 1.85 32.96 128 1000 2.94 41.47 256 1000 3.20 76.28 512 1000 3.54 137.86 1024 1000 4.37 223.40 2048 1000 5.60 348.51 4096 1000 6.84 571.12 8192 1000 9.69 806.13 16384 1000 13.94 1120.99 32768 1000 19.80 1578.24 65536 640 29.21 2139.66 131072 320 49.11 2545.23 262144 160 89.14 2804.73 524288 80 169.17 2955.67 1048576 40 329.09 3038.69 2097152 20 649.83 3077.74 4194304 10 1303.56 3068.51 8388608 5 2898.31 2760.23 16777216 2 5828.85 2744.97 33554432 1 11763.74 2720.22 67108864 1 23523.38 2720.70 134217728 1 47021.75 2722.14 268435456 1 94049.59 2721.97 536870912 1 188083.74 2722.19 1073741824 out-of-mem.; needed X= 2.001 GB; use flag "-mem X" or MAX_MEM_USAGE>=X (IMB_mem_info.h) #----------------------------------------------------------------------------- # Benchmarking Sendrecv # #processes = 2 #----------------------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Mbytes/sec 0 1000 1.50 1.50 1.50 0.00 1 1000 1.55 1.55 1.55 1.23 2 1000 1.59 1.59 1.59 2.40 4 1000 1.55 1.56 1.55 4.90 8 1000 1.58 1.58 1.58 9.66 16 1000 1.59 1.59 1.59 19.18 32 1000 1.61 1.61 1.61 37.81 64 1000 1.76 1.76 1.76 69.41 128 1000 2.92 2.92 2.92 83.67 256 1000 3.14 3.14 3.14 155.48 512 1000 3.59 3.59 3.59 271.80 1024 1000 4.41 4.41 4.41 442.69 2048 1000 5.55 5.55 5.55 703.63 4096 1000 6.81 6.81 6.81 1147.26 8192 1000 9.61 9.62 9.62 1624.49 16384 1000 14.06 14.06 14.06 2222.63 32768 1000 20.56 20.56 20.56 3039.80 65536 640 32.78 32.78 32.78 3813.02 131072 320 52.44 52.45 52.44 4766.86 262144 160 92.78 92.79 92.79 5388.33 524288 80 172.26 172.28 172.27 5804.38 1048576 40 331.17 331.22 331.19 6038.26 2097152 20 653.12 653.23 653.17 6123.45 4194304 10 1301.91 1301.96 1301.94 6144.58 8388608 5 2908.51 2908.94 2908.73 5500.29 16777216 2 5815.76 5815.99 5815.88 5502.07 33554432 1 11625.03 11626.63 11625.83 5504.60 67108864 1 23242.01 23242.12 23242.07 5507.24 134217728 1 46482.81 46484.26 46483.54 5507.24 268435456 1 92950.60 92952.30 92951.45 5508.20 536870912 1 185887.85 185889.38 185888.61 5508.65 1073741824 out-of-mem.; needed X= 2.001 GB; use flag "-mem X" or MAX_MEM_USAGE>=X (IMB_mem_info.h) #----------------------------------------------------------------------------- # Benchmarking Exchange # #processes = 2 #----------------------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Mbytes/sec 0 1000 1.80 1.80 1.80 0.00 1 1000 1.91 1.91 1.91 2.00 2 1000 1.97 1.97 1.97 3.86 4 1000 1.89 1.89 1.89 8.07 8 1000 1.97 1.97 1.97 15.45 16 1000 1.94 1.94 1.94 31.50 32 1000 1.96 1.96 1.96 62.15 64 1000 2.16 2.16 2.16 112.89 128 1000 3.47 3.47 3.47 140.82 256 1000 3.88 3.88 3.88 251.93 512 1000 4.52 4.52 4.52 432.20 1024 1000 5.85 5.85 5.85 667.85 2048 1000 7.26 7.26 7.26 1075.60 4096 1000 9.32 9.32 9.32 1676.19 8192 1000 13.59 13.60 13.59 2298.10 16384 1000 23.38 23.38 23.38 2673.16 32768 1000 35.61 35.61 35.61 3510.34 65536 640 55.02 55.02 55.02 4543.73 131072 320 95.56 95.56 95.56 5232.27 262144 160 176.98 176.99 176.99 5649.95 524288 80 337.04 337.06 337.05 5933.61 1048576 40 657.37 657.42 657.40 6084.40 2097152 20 1300.61 1300.63 1300.62 6150.88 4194304 10 2903.56 2903.58 2903.57 5510.44 8388608 5 5827.28 5827.44 5827.36 5491.27 16777216 2 11619.34 11619.67 11619.50 5507.90 33554432 1 23177.36 23179.04 23178.20 5522.23 67108864 1 46351.41 46351.62 46351.51 5523.00 134217728 1 92675.65 92676.69 92676.17 5524.58 268435456 1 185323.90 185325.96 185324.93 5525.40 536870912 out-of-mem.; needed X= 1.501 GB; use flag "-mem X" or MAX_MEM_USAGE>=X (IMB_mem_info.h) 1073741824 out-of-mem.; needed X= 3.001 GB; use flag "-mem X" or MAX_MEM_USAGE>=X (IMB_mem_info.h) #---------------------------------------------------------------- # Benchmarking Allreduce # #processes = 2 #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.02 0.02 0.02 4 1000 1.85 1.86 1.85 8 1000 1.84 1.84 1.84 16 1000 1.94 1.95 1.94 32 1000 1.85 1.85 1.85 64 1000 1.99 1.99 1.99 128 1000 3.19 3.19 3.19 256 1000 3.59 3.59 3.59 512 1000 3.96 3.96 3.96 1024 1000 4.81 4.81 4.81 2048 1000 6.34 6.34 6.34 4096 1000 8.25 8.25 8.25 8192 1000 11.99 11.99 11.99 16384 1000 22.21 22.22 22.22 32768 1000 36.45 36.45 36.45 65536 640 60.64 60.65 60.64 131072 320 100.70 100.70 100.70 262144 160 181.76 181.77 181.76 524288 80 333.34 333.36 333.35 1048576 40 633.79 633.84 633.82 2097152 20 1242.50 1242.57 1242.53 4194304 10 2832.29 2832.42 2832.35 8388608 5 6016.40 6016.60 6016.50 16777216 2 12099.67 12099.77 12099.72 33554432 1 24329.32 24330.46 24329.89 67108864 1 49616.33 49618.82 49617.58 134217728 1 98070.83 98074.18 98072.50 268435456 1 195723.88 195728.23 195726.05 536870912 1 391742.23 391749.71 391745.97 1073741824 out-of-mem.; needed X= 2.001 GB; use flag "-mem X" or MAX_MEM_USAGE>=X (IMB_mem_info.h) #---------------------------------------------------------------- # Benchmarking Reduce # #processes = 2 #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.02 0.02 0.02 4 1000 1.61 1.61 1.61 8 1000 1.65 1.65 1.65 16 1000 1.61 1.61 1.61 32 1000 1.68 1.68 1.68 64 1000 1.83 1.83 1.83 128 1000 3.05 3.05 3.05 256 1000 3.32 3.33 3.32 512 1000 4.03 4.03 4.03 1024 1000 4.75 4.75 4.75 2048 1000 6.15 6.16 6.15 4096 1000 7.93 7.94 7.93 8192 1000 11.98 11.99 11.98 16384 1000 15.99 16.00 15.99 32768 1000 26.50 26.51 26.51 65536 640 51.90 51.91 51.90 131072 320 93.67 93.73 93.70 262144 160 183.60 183.72 183.66 524288 80 364.30 364.54 364.42 1048576 40 722.11 722.59 722.35 2097152 20 1433.98 1434.90 1434.44 4194304 10 2876.14 2878.00 2877.07 8388608 5 6032.00 6036.54 6034.27 16777216 2 14185.73 14198.16 14191.95 33554432 1 24265.43 24289.55 24277.49 67108864 1 48788.46 48814.07 48801.27 134217728 1 97556.17 97578.18 97567.18 268435456 1 286223.04 286250.44 286236.74 536870912 1 397949.88 397974.35 397962.11 1073741824 out-of-mem.; needed X= 2.001 GB; use flag "-mem X" or MAX_MEM_USAGE>=X (IMB_mem_info.h) #---------------------------------------------------------------- # Benchmarking Reduce_scatter # #processes = 2 #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.02 0.02 0.02 4 1000 395.07 395.14 395.11 8 1000 2.36 2.36 2.36 16 1000 4.38 4.38 4.38 32 1000 2.25 2.25 2.25 64 1000 2.33 2.33 2.33 128 1000 2.83 2.83 2.83 256 1000 3.91 3.91 3.91 512 1000 4.16 4.16 4.16 1024 1000 4.53 4.53 4.53 2048 1000 5.58 5.58 5.58 4096 1000 7.05 7.05 7.05 8192 1000 8.93 8.93 8.93 16384 1000 13.16 13.17 13.17 32768 1000 24.79 24.79 24.79 65536 640 40.32 40.32 40.32 131072 320 71.34 71.34 71.34 262144 160 129.80 129.82 129.81 524288 80 249.79 249.82 249.81 1048576 40 485.74 485.78 485.76 2097152 20 957.96 958.02 957.99 4194304 10 2070.91 2073.72 2072.32 8388608 5 4868.46 4873.40 4870.93 16777216 2 10607.77 10636.53 10622.15 33554432 1 25022.79 25125.70 25074.24 67108864 1 58039.97 58292.73 58166.35 134217728 1 115029.46 115583.71 115306.59 268435456 1 228192.78 228342.56 228267.67 536870912 1 456598.00 456636.41 456617.20 1073741824 out-of-mem.; needed X= 2.001 GB; use flag "-mem X" or MAX_MEM_USAGE>=X (IMB_mem_info.h) #---------------------------------------------------------------- # Benchmarking Allgather # #processes = 2 #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.02 0.02 0.02 1 1000 1.63 1.63 1.63 2 1000 1.65 1.65 1.65 4 1000 1.68 1.68 1.68 8 1000 1.66 1.66 1.66 16 1000 1.67 1.67 1.67 32 1000 1.75 1.75 1.75 64 1000 1.84 1.84 1.84 128 1000 3.07 3.07 3.07 256 1000 3.29 3.29 3.29 512 1000 3.62 3.63 3.63 1024 1000 4.51 4.51 4.51 2048 1000 5.78 5.79 5.78 4096 1000 7.07 7.08 7.08 8192 1000 10.43 10.43 10.43 16384 1000 15.34 15.34 15.34 32768 1000 22.90 22.91 22.90 65536 640 34.78 34.78 34.78 131072 320 58.62 58.62 58.62 262144 160 107.86 107.87 107.87 524288 80 207.26 207.28 207.27 1048576 40 407.00 407.04 407.02 2097152 20 804.76 804.87 804.82 4194304 10 1620.13 1626.82 1623.47 8388608 5 4410.67 4415.92 4413.29 16777216 2 9020.44 9050.17 9035.30 33554432 1 17975.19 18034.62 18004.90 67108864 1 35964.28 36103.99 36034.14 134217728 1 72166.75 72232.84 72199.80 268435456 1 144603.39 144661.73 144632.56 536870912 out-of-mem.; needed X= 1.501 GB; use flag "-mem X" or MAX_MEM_USAGE>=X (IMB_mem_info.h) 1073741824 out-of-mem.; needed X= 3.001 GB; use flag "-mem X" or MAX_MEM_USAGE>=X (IMB_mem_info.h) #---------------------------------------------------------------- # Benchmarking Allgatherv # #processes = 2 #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.02 0.02 0.02 1 1000 1.65 1.65 1.65 2 1000 1.64 1.64 1.64 4 1000 1.74 1.74 1.74 8 1000 1.67 1.67 1.67 16 1000 1.72 1.72 1.72 32 1000 1.75 1.75 1.75 64 1000 1.90 1.90 1.90 128 1000 3.03 3.03 3.03 256 1000 3.29 3.29 3.29 512 1000 3.68 3.68 3.68 1024 1000 4.36 4.36 4.36 2048 1000 5.79 5.80 5.79 4096 1000 7.05 7.05 7.05 8192 1000 10.42 10.42 10.42 16384 1000 15.34 15.34 15.34 32768 1000 22.77 22.77 22.77 65536 640 33.65 33.66 33.66 131072 320 57.36 57.36 57.36 262144 160 106.87 106.89 106.88 524288 80 206.39 206.43 206.41 1048576 40 406.06 406.10 406.08 2097152 20 804.78 804.91 804.84 4194304 10 1621.92 1622.71 1622.32 8388608 5 4404.10 4411.32 4407.71 16777216 2 9046.76 9081.70 9064.23 33554432 1 18088.01 18133.40 18110.70 67108864 1 36267.74 36423.60 36345.67 134217728 1 72719.73 72803.31 72761.52 268435456 1 145702.33 145940.17 145821.25 536870912 out-of-mem.; needed X= 1.501 GB; use flag "-mem X" or MAX_MEM_USAGE>=X (IMB_mem_info.h) 1073741824 int-overflow.; The production rank*size caused int overflow for given sample #---------------------------------------------------------------- # Benchmarking Gather # #processes = 2 #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.02 0.03 0.02 1 1000 1.81 1.81 1.81 2 1000 1.57 1.58 1.58 4 1000 1.58 1.58 1.58 8 1000 1.56 1.56 1.56 16 1000 1.57 1.57 1.57 32 1000 1.60 1.60 1.60 64 1000 1.79 1.79 1.79 128 1000 2.87 2.87 2.87 256 1000 3.10 3.10 3.10 512 1000 3.52 3.52 3.52 1024 1000 4.20 4.20 4.20 2048 1000 5.66 5.66 5.66 4096 1000 6.73 6.74 6.73 8192 1000 9.46 9.47 9.47 16384 1000 17.55 17.56 17.55 32768 1000 23.41 23.41 23.41 65536 640 35.29 35.29 35.29 131072 320 60.65 60.66 60.66 262144 160 108.79 108.81 108.80 524288 80 205.89 205.92 205.91 1048576 40 401.60 401.65 401.63 2097152 20 787.30 787.40 787.35 4194304 10 1577.99 1578.13 1578.06 8388608 5 4169.94 4169.97 4169.96 16777216 2 8481.38 8481.75 8481.57 33554432 1 16864.78 16865.49 16865.13 67108864 1 34007.05 34008.62 34007.83 134217728 1 67881.75 67882.01 67881.88 268435456 1 135207.60 135207.70 135207.65 536870912 out-of-mem.; needed X= 1.501 GB; use flag "-mem X" or MAX_MEM_USAGE>=X (IMB_mem_info.h) 1073741824 out-of-mem.; needed X= 3.001 GB; use flag "-mem X" or MAX_MEM_USAGE>=X (IMB_mem_info.h) #---------------------------------------------------------------- # Benchmarking Gatherv # #processes = 2 #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.03 0.03 0.03 1 1000 1.53 1.53 1.53 2 1000 1.54 1.54 1.54 4 1000 1.56 1.56 1.56 8 1000 1.56 1.56 1.56 16 1000 1.57 1.57 1.57 32 1000 1.60 1.60 1.60 64 1000 1.78 1.78 1.78 128 1000 2.89 2.89 2.89 256 1000 3.18 3.19 3.18 512 1000 3.55 3.55 3.55 1024 1000 4.28 4.28 4.28 2048 1000 5.65 5.65 5.65 4096 1000 6.74 6.75 6.75 8192 1000 10.19 10.20 10.19 16384 1000 11.62 11.62 11.62 32768 1000 16.84 16.84 16.84 65536 640 27.38 27.38 27.38 131072 320 48.60 48.61 48.60 262144 160 91.97 92.07 92.02 524288 80 179.19 179.62 179.40 1048576 40 353.62 355.46 354.54 2097152 20 702.12 709.81 705.97 4194304 10 1413.33 1444.49 1428.91 8388608 5 3534.81 3534.84 3534.82 16777216 2 6868.49 8448.91 7658.70 33554432 1 16888.74 16889.55 16889.14 67108864 1 33872.72 33873.56 33873.14 134217728 1 67713.43 67714.82 67714.13 268435456 1 134997.48 134998.46 134997.97 536870912 out-of-mem.; needed X= 1.501 GB; use flag "-mem X" or MAX_MEM_USAGE>=X (IMB_mem_info.h) 1073741824 int-overflow.; The production rank*size caused int overflow for given sample #---------------------------------------------------------------- # Benchmarking Scatter # #processes = 2 #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.02 0.02 0.02 1 1000 1.53 1.53 1.53 2 1000 1.56 1.57 1.56 4 1000 1.56 1.57 1.56 8 1000 1.55 1.56 1.55 16 1000 1.56 1.56 1.56 32 1000 1.63 1.63 1.63 64 1000 1.73 1.74 1.73 128 1000 2.87 2.87 2.87 256 1000 3.13 3.14 3.13 512 1000 3.57 3.57 3.57 1024 1000 4.29 4.30 4.30 2048 1000 5.83 5.83 5.83 4096 1000 6.92 6.93 6.93 8192 1000 10.02 10.03 10.03 16384 1000 11.85 11.85 11.85 32768 1000 17.41 17.41 17.41 65536 640 28.73 28.74 28.73 131072 320 52.23 52.27 52.25 262144 160 99.27 99.46 99.37 524288 80 194.18 195.02 194.60 1048576 40 384.86 388.29 386.58 2097152 20 764.66 778.64 771.65 4194304 10 1536.28 1592.13 1564.20 8388608 5 3484.33 3484.75 3484.54 16777216 2 6848.07 8452.64 7650.35 33554432 1 16835.19 16839.15 16837.17 67108864 1 33844.41 33847.76 33846.09 134217728 1 67598.47 67602.85 67600.66 268435456 1 135169.17 135172.16 135170.66 536870912 out-of-mem.; needed X= 1.501 GB; use flag "-mem X" or MAX_MEM_USAGE>=X (IMB_mem_info.h) 1073741824 out-of-mem.; needed X= 3.001 GB; use flag "-mem X" or MAX_MEM_USAGE>=X (IMB_mem_info.h) #---------------------------------------------------------------- # Benchmarking Scatterv # #processes = 2 #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.03 0.03 0.03 1 1000 1.60 1.60 1.60 2 1000 1.61 1.62 1.62 4 1000 1.54 1.54 1.54 8 1000 1.57 1.57 1.57 16 1000 1.58 1.58 1.58 32 1000 1.60 1.60 1.60 64 1000 1.74 1.75 1.74 128 1000 2.87 2.87 2.87 256 1000 3.16 3.16 3.16 512 1000 3.54 3.55 3.55 1024 1000 4.25 4.25 4.25 2048 1000 5.60 5.61 5.61 4096 1000 6.79 6.80 6.80 8192 1000 9.87 9.88 9.88 16384 1000 11.92 11.92 11.92 32768 1000 17.42 17.42 17.42 65536 640 28.77 28.78 28.77 131072 320 52.24 52.29 52.27 262144 160 99.44 99.63 99.53 524288 80 194.30 195.14 194.72 1048576 40 384.87 388.32 386.60 2097152 20 764.84 778.75 771.79 4194304 10 1533.88 1590.26 1562.07 8388608 5 3487.85 3488.08 3487.97 16777216 2 6843.44 8456.57 7650.01 33554432 1 16847.69 16851.27 16849.48 67108864 1 33838.03 33841.85 33839.94 134217728 1 67622.83 67626.50 67624.66 268435456 1 135146.00 135149.67 135147.84 536870912 out-of-mem.; needed X= 1.501 GB; use flag "-mem X" or MAX_MEM_USAGE>=X (IMB_mem_info.h) 1073741824 int-overflow.; The production rank*size caused int overflow for given sample #---------------------------------------------------------------- # Benchmarking Alltoall # #processes = 2 #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.02 0.02 0.02 1 1000 1.69 1.69 1.69 2 1000 1.71 1.71 1.71 4 1000 1.68 1.68 1.68 8 1000 1.71 1.71 1.71 16 1000 1.74 1.74 1.74 32 1000 1.76 1.76 1.76 64 1000 1.90 1.90 1.90 128 1000 3.08 3.08 3.08 256 1000 3.25 3.26 3.26 512 1000 3.64 3.65 3.65 1024 1000 4.36 4.36 4.36 2048 1000 5.79 5.80 5.79 4096 1000 7.10 7.10 7.10 8192 1000 10.17 10.17 10.17 16384 1000 15.36 15.36 15.36 32768 1000 21.75 21.75 21.75 65536 640 33.29 33.29 33.29 131072 320 57.44 57.45 57.44 262144 160 106.66 106.67 106.67 524288 80 206.17 206.19 206.18 1048576 40 405.88 405.88 405.88 2097152 20 804.58 804.74 804.66 4194304 10 1616.19 1616.23 1616.21 8388608 5 4377.05 4379.26 4378.16 16777216 2 9049.75 9084.95 9067.35 33554432 1 18100.35 18131.61 18115.98 67108864 1 36402.11 36538.71 36470.41 134217728 1 72699.81 72789.45 72744.63 268435456 1 146759.77 146876.75 146818.26 536870912 out-of-mem.; needed X= 2.001 GB; use flag "-mem X" or MAX_MEM_USAGE>=X (IMB_mem_info.h) 1073741824 out-of-mem.; needed X= 4.001 GB; use flag "-mem X" or MAX_MEM_USAGE>=X (IMB_mem_info.h) #---------------------------------------------------------------- # Benchmarking Alltoallv # #processes = 2 #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.06 0.07 0.06 1 1000 1.68 1.68 1.68 2 1000 1.67 1.67 1.67 4 1000 1.67 1.67 1.67 8 1000 1.71 1.71 1.71 16 1000 1.75 1.75 1.75 32 1000 1.74 1.74 1.74 64 1000 1.88 1.88 1.88 128 1000 3.10 3.10 3.10 256 1000 3.26 3.26 3.26 512 1000 3.71 3.71 3.71 1024 1000 4.55 4.55 4.55 2048 1000 5.82 5.82 5.82 4096 1000 7.14 7.14 7.14 8192 1000 10.25 10.26 10.26 16384 1000 15.38 15.38 15.38 32768 1000 23.88 23.89 23.88 65536 640 34.68 34.68 34.68 131072 320 58.76 58.77 58.77 262144 160 108.25 108.27 108.26 524288 80 207.42 207.45 207.43 1048576 40 407.42 407.47 407.44 2097152 20 804.55 804.67 804.61 4194304 10 1618.51 1618.72 1618.62 8388608 5 4386.88 4387.30 4387.09 16777216 2 9050.90 9052.14 9051.52 33554432 1 18005.61 18006.49 18006.05 67108864 1 36255.91 36255.96 36255.93 134217728 1 72249.89 72250.78 72250.34 268435456 1 144276.12 144276.25 144276.18 536870912 out-of-mem.; needed X= 2.001 GB; use flag "-mem X" or MAX_MEM_USAGE>=X (IMB_mem_info.h) 1073741824 int-overflow.; The production rank*size caused int overflow for given sample #---------------------------------------------------------------- # Benchmarking Bcast # #processes = 2 #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.02 0.07 0.04 1 1000 1.75 1.75 1.75 2 1000 1.84 1.84 1.84 4 1000 1.75 1.75 1.75 8 1000 1.78 1.78 1.78 16 1000 1.80 1.80 1.80 32 1000 1.81 1.81 1.81 64 1000 1.99 2.00 2.00 128 1000 3.08 3.08 3.08 256 1000 3.39 3.40 3.40 512 1000 3.77 3.78 3.78 1024 1000 4.38 4.38 4.38 2048 1000 5.59 5.60 5.59 4096 1000 7.69 7.69 7.69 8192 1000 10.43 10.44 10.44 16384 1000 15.79 15.80 15.80 32768 1000 27.51 27.51 27.51 65536 640 45.65 45.65 45.65 131072 320 77.84 77.84 77.84 262144 160 147.23 147.24 147.23 524288 80 183.27 183.27 183.27 1048576 40 366.69 366.70 366.69 2097152 20 734.03 734.04 734.04 4194304 10 1489.61 1489.68 1489.65 8388608 5 3115.39 3115.81 3115.60 16777216 2 6231.69 6231.74 6231.71 33554432 1 12438.42 12441.08 12439.75 67108864 1 24900.72 24902.75 24901.73 134217728 1 49817.79 49819.92 49818.86 268435456 1 151423.31 151424.52 151423.92 536870912 1 305391.52 305392.31 305391.92 1073741824 out-of-mem.; needed X= 2.001 GB; use flag "-mem X" or MAX_MEM_USAGE>=X (IMB_mem_info.h) #--------------------------------------------------- # Benchmarking Barrier # #processes = 2 #--------------------------------------------------- #repetitions t_min[usec] t_max[usec] t_avg[usec] 1000 1.52 1.52 1.52 # All processes entering MPI_Finalize
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ devel mailing list devel@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/devel