After installing UCX 1.5.0 and OpenMPI 4.0.1 compiled for UCX and without verbs (full details below), my NetPIPE benchmark is reporting message failures for some message sizes above 300 KB. There are no failures when I benchmark with a non-UCX (verbs) version of OpenMPI 4.0.1, and no failures when I test the UCX version with --mca btl tcp,self. These failures show up in testing QDR IB and 40 GbE networks. NetPIPE tests the first and last bytes always, but can do a full integrity test using --integrity that tests all bytes and this shows that no message is being received in the cases of the failures.
Details on the system and software installation are below followed by several NetPIPE runs illustrating the errors. This includes a minimal case of 3 ping-pong messages where the middle one shows failures. Let me know if there's any more information you need, or any additional tests I can run. Dave Turner CentOS 7 on Intel processors, QDR IB and 40 GbE tests UCX 1.5.0 installed from the tarball according to the docs on the webpage OpenMPI-4.0.1 configured for verbs with: ./configure F77=ifort FC=ifort --prefix=/homes/daveturner/libs/openmpi-4.0.1-verbs --enable-mpirun-prefix-by-default --enable-mpi-fortran=all --enable-mpi-cxx --enable-ipv6 --with-verbs --with-slurm --disable-dlopen OpenMPI-4.0.1 configured for UCX with: ./configure F77=ifort FC=ifort --prefix=/homes/daveturner/libs/openmpi-4.0.1-ucx --enable-mpirun-prefix-by-default --enable-mpi-fortran=all --enable-mpi-cxx --enable-ipv6 --without-verbs --with-slurm --disable-dlopen --with-ucx=/homes/daveturner/libs/ucx-1.5.0/install NetPIPE compiled with: /homes/daveturner/libs/openmpi-4.0.1-ucx/bin/mpicc -g -O3 -Wall -lrt -DMPI ./src/netpipe.c ./src/mpi.c -o NPmpi-4.0.1-ucx -I./src (http://netpipe.cs.ksu.edu/ compiled with 'make mpi') ************************************************************************************** Normal uni-directional point-to-point test shows errors (testing first and last bytes) for messages over 300 KB. ************************************************************************************** Elf77 /homes/daveturner/libs/openmpi-4.0.1-ucx/bin/mpirun -np 2 --hostfile hf.elf NPmpi-4.0.1-ucx -o np.elf.mpi-4.0.1-ucx-ib --printhostnames Saving output to np.elf.mpi-4.0.1-ucx-ib Proc 0 is on host elf77 Proc 1 is on host elf78 Clock resolution ~ 1.000 nsecs Clock accuracy ~ 38.000 nsecs Start testing with 7 trials for each message size 1: 1 B 24999 times --> 3.766 Mbps in 2.124 usecs 2: 2 B 117702 times --> 8.386 Mbps in 1.908 usecs 3: 3 B 131032 times --> 12.633 Mbps in 1.900 usecs 4: 4 B 131592 times --> 16.715 Mbps in 1.914 usecs 5: 6 B 130589 times --> 25.077 Mbps in 1.914 usecs 6: 8 B 130608 times --> 33.402 Mbps in 1.916 usecs 7: 12 B 130477 times --> 50.047 Mbps in 1.918 usecs 8: 13 B 130329 times --> 54.872 Mbps in 1.895 usecs 9: 16 B 131904 times --> 67.187 Mbps in 1.905 usecs 10: 19 B 131225 times --> 79.255 Mbps in 1.918 usecs 11: 21 B 130353 times --> 87.118 Mbps in 1.928 usecs 12: 24 B 129640 times --> 99.831 Mbps in 1.923 usecs 13: 27 B 129988 times --> 111.760 Mbps in 1.933 usecs 14: 29 B 129351 times --> 121.048 Mbps in 1.917 usecs 15: 32 B 130439 times --> 132.620 Mbps in 1.930 usecs 16: 35 B 129511 times --> 144.272 Mbps in 1.941 usecs 17: 45 B 128814 times --> 182.881 Mbps in 1.968 usecs 18: 48 B 127000 times --> 194.231 Mbps in 1.977 usecs 19: 51 B 126452 times --> 206.193 Mbps in 1.979 usecs 20: 61 B 126343 times --> 236.168 Mbps in 2.066 usecs 21: 64 B 120987 times --> 244.690 Mbps in 2.092 usecs 22: 67 B 119477 times --> 256.660 Mbps in 2.088 usecs 23: 93 B 119710 times --> 242.428 Mbps in 3.069 usecs 24: 96 B 81460 times --> 250.503 Mbps in 3.066 usecs 25: 99 B 81543 times --> 258.376 Mbps in 3.065 usecs 26: 125 B 81558 times --> 321.127 Mbps in 3.114 usecs 27: 128 B 80281 times --> 328.788 Mbps in 3.114 usecs 28: 131 B 80270 times --> 336.387 Mbps in 3.115 usecs 29: 189 B 80244 times --> 474.304 Mbps in 3.188 usecs 30: 192 B 78423 times --> 482.258 Mbps in 3.185 usecs 31: 195 B 78492 times --> 489.635 Mbps in 3.186 usecs 32: 253 B 78467 times --> 623.891 Mbps in 3.244 usecs 33: 256 B 77061 times --> 631.098 Mbps in 3.245 usecs 34: 259 B 77038 times --> 637.905 Mbps in 3.248 usecs 35: 381 B 76967 times --> 906.297 Mbps in 3.363 usecs 36: 384 B 74335 times --> 913.387 Mbps in 3.363 usecs 37: 387 B 74331 times --> 921.348 Mbps in 3.360 usecs 38: 509 B 74398 times --> 1.166 Gbps in 3.493 usecs 39: 512 B 71575 times --> 1.176 Gbps in 3.484 usecs 40: 515 B 71755 times --> 1.183 Gbps in 3.483 usecs 41: 765 B 71780 times --> 1.614 Gbps in 3.793 usecs 42: 768 B 65912 times --> 1.623 Gbps in 3.787 usecs 43: 771 B 66023 times --> 1.630 Gbps in 3.785 usecs 44: 1.021 KB 66050 times --> 2.034 Gbps in 4.016 usecs 45: 1.024 KB 62257 times --> 2.043 Gbps in 4.010 usecs 46: 1.027 KB 62338 times --> 2.050 Gbps in 4.007 usecs 47: 1.533 KB 62387 times --> 2.699 Gbps in 4.545 usecs 48: 1.536 KB 55010 times --> 2.708 Gbps in 4.538 usecs 49: 1.539 KB 55084 times --> 2.708 Gbps in 4.547 usecs 50: 2.045 KB 54978 times --> 3.216 Gbps in 5.086 usecs 51: 2.048 KB 49150 times --> 3.222 Gbps in 5.085 usecs 52: 2.051 KB 49166 times --> 3.225 Gbps in 5.088 usecs 53: 3.069 KB 49139 times --> 4.488 Gbps in 5.471 usecs 54: 3.072 KB 45697 times --> 4.494 Gbps in 5.469 usecs 55: 3.075 KB 45713 times --> 4.496 Gbps in 5.472 usecs 56: 4.093 KB 45686 times --> 5.570 Gbps in 5.878 usecs 57: 4.096 KB 42528 times --> 5.568 Gbps in 5.885 usecs 58: 4.099 KB 42483 times --> 5.550 Gbps in 5.909 usecs 59: 6.141 KB 42310 times --> 7.342 Gbps in 6.692 usecs 60: 6.144 KB 37359 times --> 7.315 Gbps in 6.719 usecs 61: 6.147 KB 37206 times --> 7.343 Gbps in 6.697 usecs 62: 8.189 KB 37328 times --> 8.286 Gbps in 7.907 usecs 63: 8.192 KB 31619 times --> 8.265 Gbps in 7.930 usecs 64: 8.195 KB 31527 times --> 8.254 Gbps in 7.943 usecs 65: 12.285 KB 31476 times --> 11.154 Gbps in 8.811 usecs 66: 12.288 KB 28373 times --> 11.157 Gbps in 8.811 usecs 67: 12.291 KB 28372 times --> 11.116 Gbps in 8.846 usecs 68: 16.381 KB 28262 times --> 12.525 Gbps in 10.463 usecs 69: 16.384 KB 23893 times --> 12.501 Gbps in 10.485 usecs 70: 16.387 KB 23843 times --> 12.502 Gbps in 10.486 usecs 71: 24.573 KB 23841 times --> 15.127 Gbps in 12.995 usecs 72: 24.576 KB 19237 times --> 15.120 Gbps in 13.003 usecs 73: 24.579 KB 19226 times --> 15.115 Gbps in 13.009 usecs 74: 32.765 KB 19217 times --> 17.114 Gbps in 15.316 usecs 75: 32.768 KB 16323 times --> 17.133 Gbps in 15.301 usecs 76: 32.771 KB 16339 times --> 17.117 Gbps in 15.316 usecs 77: 49.149 KB 16322 times --> 19.686 Gbps in 19.973 usecs 78: 49.152 KB 12516 times --> 19.644 Gbps in 20.017 usecs 79: 49.155 KB 12489 times --> 19.635 Gbps in 20.027 usecs 80: 65.533 KB 12483 times --> 21.295 Gbps in 24.619 usecs 81: 65.536 KB 10154 times --> 21.277 Gbps in 24.641 usecs 82: 65.539 KB 10145 times --> 21.265 Gbps in 24.656 usecs 83: 98.301 KB 10139 times --> 23.107 Gbps in 34.034 usecs 84: 98.304 KB 7345 times --> 23.137 Gbps in 33.990 usecs 85: 98.307 KB 7355 times --> 23.089 Gbps in 34.063 usecs 86: 131.069 KB 7339 times --> 24.208 Gbps in 43.314 usecs 87: 131.072 KB 5771 times --> 24.218 Gbps in 43.297 usecs 88: 131.075 KB 5774 times --> 24.192 Gbps in 43.345 usecs 89: 196.605 KB 5767 times --> 25.365 Gbps in 62.008 usecs 90: 196.608 KB 4031 times --> 25.326 Gbps in 62.106 usecs 91: 196.611 KB 4025 times --> 25.365 Gbps in 62.011 usecs 92: 262.141 KB 4031 times --> 26.066 Gbps in 80.454 usecs 93: 262.144 KB 3107 times --> 27.495 Gbps in 76.275 usecs 94: 262.147 KB 3277 times --> 27.162 Gbps in 77.210 usecs 95: 393.213 KB 3237 times --> 28.291 Gbps in 111.192 usecs 1 failures 96: 393.216 KB 2248 times --> 28.529 Gbps in 110.265 usecs 31472 failures 97: 393.219 KB 2267 times --> 28.360 Gbps in 110.922 usecs 1 failures 98: 524.285 KB 2253 times --> 28.830 Gbps in 145.483 usecs 1 failures 99: 524.288 KB 1718 times --> 28.869 Gbps in 145.288 usecs 24052 failures 100: 524.291 KB 1720 times --> 29.043 Gbps in 144.417 usecs 1 failures 101: 786.429 KB 1731 times --> 29.451 Gbps in 213.626 usecs 1 failures 102: 786.432 KB 1170 times --> 29.383 Gbps in 214.122 usecs 16380 failures 103: 786.435 KB 1167 times --> 29.481 Gbps in 213.408 usecs 1 failures 104: 1.049 MB 1171 times --> 29.791 Gbps in 281.580 usecs 1 failures 105: 1.049 MB 887 times --> 29.801 Gbps in 281.485 usecs 12418 failures 106: 1.049 MB 888 times --> 29.694 Gbps in 282.505 usecs 1 failures 107: 1.573 MB 884 times --> 30.118 Gbps in 417.786 usecs 1 failures 108: 1.573 MB 598 times --> 30.140 Gbps in 417.489 usecs 8372 failures 109: 1.573 MB 598 times --> 30.032 Gbps in 418.985 usecs 1 failures 110: 2.097 MB 596 times --> 30.179 Gbps in 555.919 usecs 1 failures 111: 2.097 MB 449 times --> 30.161 Gbps in 556.255 usecs 6286 failures 112: 2.097 MB 449 times --> 30.199 Gbps in 555.549 usecs 1 failures 113: 3.146 MB 450 times --> 30.372 Gbps in 828.586 usecs 1 failures 114: 3.146 MB 301 times --> 30.302 Gbps in 830.498 usecs 4214 failures 115: 3.146 MB 301 times --> 30.413 Gbps in 827.462 usecs 1 failures 116: 4.194 MB 302 times --> 30.442 Gbps in 1.102 msecs 1 failures 117: 4.194 MB 226 times --> 30.443 Gbps in 1.102 msecs 3164 failures 118: 4.194 MB 226 times --> 30.342 Gbps in 1.106 msecs 1 failures 119: 6.291 MB 226 times --> 29.276 Gbps in 1.719 msecs 120: 6.291 MB 145 times --> 29.274 Gbps in 1.719 msecs 2030 failures 121: 6.291 MB 145 times --> 29.199 Gbps in 1.724 msecs 122: 8.389 MB 145 times --> 29.012 Gbps in 2.313 msecs 1 failures 123: 8.389 MB 108 times --> 29.046 Gbps in 2.310 msecs 1512 failures 124: 8.389 MB 108 times --> 29.010 Gbps in 2.313 msecs 1 failures Completed with max bandwidth 30.299 Gbps 1.931 usecs latency ************************************************************************************** uni-directional point-to-point test with integrity check just doest 1 test for each message size but tests all bytes, not just first and last bytes. ************************************************************************************** Elf77 /homes/daveturner/libs/openmpi-4.0.1-ucx/bin/mpirun -np 2 --hostfile hf.elf NPmpi-4.0.1-ucx --printhostnames --integrity Proc 0 is on host elf77 Doing a message integrity check instead of measuring performance Proc 1 is on host elf78 Clock resolution ~ 1.000 nsecs Clock accuracy ~ 39.000 nsecs Start testing with 1 trials for each message size 1: 1 B 24999 times --> 0 failures 2: 2 B 110029 times --> 0 failures 3: 3 B 111886 times --> 0 failures 4: 4 B 129467 times --> 0 failures 5: 6 B 129909 times --> 0 failures 6: 8 B 129157 times --> 0 failures 7: 12 B 128443 times --> 0 failures 8: 13 B 127507 times --> 0 failures 9: 16 B 126905 times --> 0 failures 10: 19 B 126650 times --> 0 failures 11: 21 B 125962 times --> 0 failures 12: 24 B 123901 times --> 0 failures 13: 27 B 124594 times --> 0 failures 14: 29 B 124139 times --> 0 failures 15: 32 B 123816 times --> 0 failures 16: 35 B 123149 times --> 0 failures 17: 45 B 122853 times --> 0 failures 18: 48 B 117985 times --> 0 failures 19: 51 B 117168 times --> 0 failures 20: 61 B 116449 times --> 0 failures 21: 64 B 110200 times --> 0 failures 22: 67 B 109647 times --> 0 failures 23: 93 B 108702 times --> 0 failures 24: 96 B 73950 times --> 0 failures 25: 99 B 74284 times --> 0 failures 26: 125 B 73998 times --> 0 failures 27: 128 B 71432 times --> 0 failures 28: 131 B 70523 times --> 0 failures 29: 189 B 71108 times --> 0 failures 30: 192 B 66161 times --> 0 failures 31: 195 B 66110 times --> 0 failures 32: 253 B 65814 times --> 0 failures 33: 256 B 62284 times --> 0 failures 34: 259 B 61922 times --> 0 failures 35: 381 B 61869 times --> 0 failures 36: 384 B 55731 times --> 0 failures 37: 387 B 55543 times --> 0 failures 38: 509 B 55352 times --> 0 failures 39: 512 B 50377 times --> 0 failures 40: 515 B 50237 times --> 0 failures 41: 765 B 50128 times --> 0 failures 42: 768 B 41593 times --> 0 failures 43: 771 B 41667 times --> 0 failures 44: 1.021 KB 41022 times --> 0 failures 45: 1.024 KB 35848 times --> 0 failures 46: 1.027 KB 35854 times --> 0 failures 47: 1.533 KB 35714 times --> 0 failures 48: 1.536 KB 27686 times --> 0 failures 49: 1.539 KB 27646 times --> 0 failures 50: 2.045 KB 27588 times --> 0 failures 51: 2.048 KB 22461 times --> 0 failures 52: 2.051 KB 22423 times --> 0 failures 53: 3.069 KB 22397 times --> 0 failures 54: 3.072 KB 17121 times --> 0 failures 55: 3.075 KB 17115 times --> 0 failures 56: 4.093 KB 17103 times --> 0 failures 57: 4.096 KB 13806 times --> 0 failures 58: 4.099 KB 13859 times --> 0 failures 59: 6.141 KB 13825 times --> 0 failures 60: 6.144 KB 10018 times --> 0 failures 61: 6.147 KB 10028 times --> 0 failures 62: 8.189 KB 10025 times --> 0 failures 63: 8.192 KB 7746 times --> 0 failures 64: 8.195 KB 7745 times --> 0 failures 65: 12.285 KB 7754 times --> 0 failures 66: 12.288 KB 5506 times --> 0 failures 67: 12.291 KB 5494 times --> 0 failures 68: 16.381 KB 5365 times --> 0 failures 69: 16.384 KB 4221 times --> 0 failures 70: 16.387 KB 4223 times --> 0 failures 71: 24.573 KB 4195 times --> 0 failures 72: 24.576 KB 3003 times --> 0 failures 73: 24.579 KB 3012 times --> 0 failures 74: 32.765 KB 3024 times --> 0 failures 75: 32.768 KB 2321 times --> 0 failures 76: 32.771 KB 2324 times --> 0 failures 77: 49.149 KB 2322 times --> 0 failures 78: 49.152 KB 1557 times --> 0 failures 79: 49.155 KB 1558 times --> 0 failures 80: 65.533 KB 1554 times --> 0 failures 81: 65.536 KB 1198 times --> 0 failures 82: 65.539 KB 1200 times --> 0 failures 83: 98.301 KB 1199 times --> 0 failures 84: 98.304 KB 808 times --> 0 failures 85: 98.307 KB 808 times --> 0 failures 86: 131.069 KB 808 times --> 0 failures 87: 131.072 KB 609 times --> 0 failures 88: 131.075 KB 609 times --> 0 failures 89: 196.605 KB 609 times --> 0 failures 90: 196.608 KB 410 times --> 0 failures 91: 196.611 KB 410 times --> 0 failures 92: 262.141 KB 410 times --> 0 failures 93: 262.144 KB 309 times --> 0 failures 94: 262.147 KB 284 times --> 0 failures 95: 393.213 KB 283 times --> 393212 failures 96: 393.216 KB 190 times --> 1180022 failures 97: 393.219 KB 206 times --> 393218 failures 98: 524.285 KB 189 times --> 524284 failures 99: 524.288 KB 143 times --> 1573144 failures 100: 524.291 KB 155 times --> 524290 failures 101: 786.429 KB 143 times --> 786428 failures 102: 786.432 KB 95 times --> 2359480 failures 103: 786.435 KB 103 times --> 786434 failures 104: 1.049 MB 95 times --> 1048572 failures 105: 1.049 MB 72 times --> 3145866 failures 106: 1.049 MB 77 times --> 1048578 failures 107: 1.573 MB 71 times --> 1572860 failures 108: 1.573 MB 48 times --> 4718682 failures 109: 1.573 MB 51 times --> 1572866 failures 110: 2.097 MB 48 times --> 2097148 failures 111: 2.097 MB 36 times --> 6291522 failures 112: 2.097 MB 38 times --> 2097154 failures 113: 3.146 MB 36 times --> 0 failures 114: 3.146 MB 24 times --> 9437226 failures 115: 3.146 MB 25 times --> 0 failures 116: 4.194 MB 24 times --> 4194300 failures 117: 4.194 MB 18 times --> 12582942 failures 118: 4.194 MB 18 times --> 4194306 failures 119: 6.291 MB 18 times --> 6291452 failures 120: 6.291 MB 12 times --> 18874386 failures 121: 6.291 MB 12 times --> 6291458 failures 122: 8.389 MB 12 times --> 8388604 failures 123: 8.389 MB 9 times --> 25165836 failures 124: 8.389 MB 9 times --> 8388610 failures Completed with max bandwidth 2.596 Gbps 2.013 usecs latency ************************************************************************************** minimal uni-directional point-to-point with just 3 messages being passed round trip, then the same with tcp only showing no failures when UCX is not used. ************************************************************************************** Elf77 /homes/daveturner/libs/openmpi-4.0.1-ucx/bin/mpirun -np 2 --hostfile hf.elf NPmpi-4.0.1-ucx --printhostnames --integrity --start 393216 --end 393216 --repeats 1 Proc 0 is on host elf77 Proc 1 is on host elf78 Doing a message integrity check instead of measuring performance Using a constant number of 1 transmissions NOTE: Be leary of timings that are close to the clock accuracy. Clock resolution ~ 1.000 nsecs Clock accuracy ~ 39.000 nsecs Start testing with 1 trials for each message size 1: 393.213 KB 1 times --> 0 failures 2: 393.216 KB 1 times --> 786430 failures 3: 393.219 KB 1 times --> 0 failures Completed with max bandwidth 257.855 Mbps 6.496 msecs latency Elf77 /homes/daveturner/libs/openmpi-4.0.1-ucx/bin/mpirun -np 2 --mca btl tcp,self --hostfile hf.elf NPmpi-4.0.1-ucx --printhostnames --integrity --start 393216 --end 393216 --repeats 1 Proc 0 is on host elf77 Doing a message integrity check instead of measuring performance Using a constant number of 1 transmissions NOTE: Be leary of timings that are close to the clock accuracy. Proc 1 is on host elf78 Clock resolution ~ 1.000 nsecs Clock accuracy ~ 33.000 nsecs Start testing with 1 trials for each message size 1: 393.213 KB 1 times --> 0 failures 2: 393.216 KB 1 times --> 0 failures 3: 393.219 KB 1 times --> 0 failures Completed with max bandwidth 232.044 Mbps 7.004 msecs latency ************************************************************************************** uni-directional point-to-point test with integrity check has no failures when restricted to only factors of 8 bytes. However, the full test with more messages of each size still shows some failures. ************************************************************************************** Elf77 /homes/daveturner/libs/openmpi-4.0.1-ucx/bin/mpirun -np 2 --hostfile hf.elf NPmpi-4.0.1-ucx --printhostnames --integrity --repeats 1 --pert 0 Proc 0 is on host elf77 Doing a message integrity check instead of measuring performance Using a constant number of 1 transmissions NOTE: Be leary of timings that are close to the clock accuracy. Clock resolution ~ 1.000 nsecs Clock accuracy ~ 34.000 nsecs Start testing with 1 trials for each message size 1: 1 B 1 times --> 0 failures Proc 1 is on host elf78 2: 2 B 1 times --> 0 failures 3: 3 B 1 times --> 0 failures 4: 4 B 1 times --> 0 failures 5: 6 B 1 times --> 0 failures 6: 8 B 1 times --> 0 failures 7: 12 B 1 times --> 0 failures 8: 16 B 1 times --> 0 failures 9: 24 B 1 times --> 0 failures 10: 32 B 1 times --> 0 failures 11: 48 B 1 times --> 0 failures 12: 64 B 1 times --> 0 failures 13: 96 B 1 times --> 0 failures 14: 128 B 1 times --> 0 failures 15: 192 B 1 times --> 0 failures 16: 256 B 1 times --> 0 failures 17: 384 B 1 times --> 0 failures 18: 512 B 1 times --> 0 failures 19: 768 B 1 times --> 0 failures 20: 1.024 KB 1 times --> 0 failures 21: 1.536 KB 1 times --> 0 failures 22: 2.048 KB 1 times --> 0 failures 23: 3.072 KB 1 times --> 0 failures 24: 4.096 KB 1 times --> 0 failures 25: 6.144 KB 1 times --> 0 failures 26: 8.192 KB 1 times --> 0 failures 27: 12.288 KB 1 times --> 0 failures 28: 16.384 KB 1 times --> 0 failures 29: 24.576 KB 1 times --> 0 failures 30: 32.768 KB 1 times --> 0 failures 31: 49.152 KB 1 times --> 0 failures 32: 65.536 KB 1 times --> 0 failures 33: 98.304 KB 1 times --> 0 failures 34: 131.072 KB 1 times --> 0 failures 35: 196.608 KB 1 times --> 0 failures 36: 262.144 KB 1 times --> 0 failures 37: 393.216 KB 1 times --> 0 failures 38: 524.288 KB 1 times --> 0 failures 39: 786.432 KB 1 times --> 0 failures 40: 1.049 MB 1 times --> 0 failures 41: 1.573 MB 1 times --> 0 failures 42: 2.097 MB 1 times --> 0 failures 43: 3.146 MB 1 times --> 0 failures 44: 4.194 MB 1 times --> 0 failures 45: 6.291 MB 1 times --> 0 failures 46: 8.389 MB 1 times --> 0 failures Completed with max bandwidth 1.108 Gbps 4.775 usecs latency Elf77 /homes/daveturner/libs/openmpi-4.0.1-ucx/bin/mpirun -np 2 --hostfile hf.elf NPmpi-4.0.1-ucx --printhostnames --pert 0 Proc 0 is on host elf77 Proc 1 is on host elf78 Clock resolution ~ 1.000 nsecs Clock accuracy ~ 33.000 nsecs Start testing with 7 trials for each message size 1: 1 B 24999 times --> 3.792 Mbps in 2.110 usecs 2: 2 B 118504 times --> 8.337 Mbps in 1.919 usecs 3: 3 B 130269 times --> 12.513 Mbps in 1.918 usecs 4: 4 B 130341 times --> 16.568 Mbps in 1.931 usecs 5: 6 B 129437 times --> 24.877 Mbps in 1.929 usecs 6: 8 B 129569 times --> 33.003 Mbps in 1.939 usecs 7: 12 B 128919 times --> 49.425 Mbps in 1.942 usecs 8: 16 B 128711 times --> 65.956 Mbps in 1.941 usecs 9: 24 B 128820 times --> 98.671 Mbps in 1.946 usecs 10: 32 B 128477 times --> 130.950 Mbps in 1.955 usecs 11: 48 B 127880 times --> 191.863 Mbps in 2.001 usecs 12: 64 B 124910 times --> 242.916 Mbps in 2.108 usecs 13: 96 B 118611 times --> 249.356 Mbps in 3.080 usecs 14: 128 B 81170 times --> 326.676 Mbps in 3.135 usecs 15: 192 B 79754 times --> 479.527 Mbps in 3.203 usecs 16: 256 B 78047 times --> 627.600 Mbps in 3.263 usecs 17: 384 B 76611 times --> 904.073 Mbps in 3.398 usecs 18: 512 B 73573 times --> 1.170 Gbps in 3.502 usecs 19: 768 B 71385 times --> 1.615 Gbps in 3.805 usecs 20: 1.024 KB 65696 times --> 2.033 Gbps in 4.029 usecs 21: 1.536 KB 62047 times --> 2.695 Gbps in 4.560 usecs 22: 2.048 KB 54822 times --> 3.210 Gbps in 5.105 usecs 23: 3.072 KB 48974 times --> 4.485 Gbps in 5.480 usecs 24: 4.096 KB 45623 times --> 5.572 Gbps in 5.881 usecs 25: 6.144 KB 42511 times --> 7.320 Gbps in 6.715 usecs 26: 8.192 KB 37230 times --> 8.274 Gbps in 7.921 usecs 27: 12.288 KB 31561 times --> 11.203 Gbps in 8.774 usecs 28: 16.384 KB 28491 times --> 12.503 Gbps in 10.483 usecs 29: 24.576 KB 23847 times --> 15.161 Gbps in 12.968 usecs 30: 32.768 KB 19278 times --> 17.159 Gbps in 15.278 usecs 31: 49.152 KB 16363 times --> 19.749 Gbps in 19.911 usecs 32: 65.536 KB 12555 times --> 21.306 Gbps in 24.608 usecs 33: 98.304 KB 10159 times --> 23.167 Gbps in 33.946 usecs 34: 131.072 KB 7364 times --> 24.182 Gbps in 43.361 usecs 35: 196.608 KB 5765 times --> 25.415 Gbps in 61.887 usecs 36: 262.144 KB 4039 times --> 27.430 Gbps in 76.454 usecs 37: 393.216 KB 3269 times --> 28.316 Gbps in 111.095 usecs 38: 524.288 KB 2250 times --> 28.794 Gbps in 145.667 usecs 1 failures 39: 786.432 KB 1716 times --> 29.399 Gbps in 214.004 usecs 1 failures 40: 1.049 MB 1168 times --> 29.739 Gbps in 282.073 usecs 1 failures 41: 1.573 MB 886 times --> 30.059 Gbps in 418.613 usecs 42: 2.097 MB 597 times --> 30.099 Gbps in 557.407 usecs 43: 3.146 MB 448 times --> 30.408 Gbps in 827.607 usecs 1 failures 44: 4.194 MB 302 times --> 30.256 Gbps in 1.109 msecs 1 failures 45: 6.291 MB 225 times --> 29.272 Gbps in 1.719 msecs 1 failures 46: 8.389 MB 145 times --> 29.010 Gbps in 2.313 msecs 1 failures Completed with max bandwidth 30.112 Gbps 1.953 usecs latency -- Work: davetur...@ksu.edu (785) 532-7791 2219 Engineering Hall, Manhattan KS 66506 Home: drdavetur...@gmail.com cell: (785) 770-5929
_______________________________________________ devel mailing list devel@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/devel