Hi, on one node ./IOR running with OpenMPI but with two node it fails with "][connect/btl_openib_connect_udcm.c:1575:udcm_wait_for_send_completion] send failed with verbs status 2"
One Node [root@vcn03 C]# mpirun --allow-run-as-root -np 1 -host vcn03 ./IOR -------------------------------------------------------------------------- WARNING: No preset parameters were found for the device that Open MPI detected: Local host: vcn03 Device name: mlx5_0 Device vendor ID: 0x02c9 Device vendor part ID: 4114 Default device parameters will be used, which may result in lower performance. You can edit any of the files specified by the btl_openib_device_param_files MCA parameter to set values for your device. NOTE: You can turn off this warning by setting the MCA parameter btl_openib_warn_no_device_params_found to 0. -------------------------------------------------------------------------- [vcn03][[33605,1],0][connect/btl_openib_connect_udcm.c:1235:udcm_rc_qp_to_rtr] error modifing QP to RTR errno says Invalid argument IOR-2.10.3: MPI Coordinated Test of Parallel I/O Run began: Tue Mar 13 11:50:15 2018 Command line used: ./IOR Machine: Linux vcn03 Summary: api = POSIX test filename = testFile access = single-shared-file ordering in a file = sequential offsets ordering inter file= no tasks offsets clients = 1 (1 per node) repetitions = 1 xfersize = 262144 bytes blocksize = 1 MiB aggregate filesize = 1 MiB Operation Max (MiB) Min (MiB) Mean (MiB) Std Dev Max (OPs) Min (OPs) Mean (OPs) Std Dev Mean (s) --------- --------- --------- ---------- ------- --------- --------- ---------- ------- -------- write 312.36 312.36 312.36 0.00 1249.44 1249.44 1249.44 0.00 0.00320 EXCEL read 996.42 996.42 996.42 0.00 3985.69 3985.69 3985.69 0.00 0.00100 EXCEL Max Write: 312.36 MiB/sec (327.53 MB/sec) Max Read: 996.42 MiB/sec (1044.82 MB/sec) Run finished: Tue Mar 13 11:50:15 2018 two node run [root@vcn03 C]# mpirun --allow-run-as-root -np 2 -host vcn03,vcn04 ./IOR -------------------------------------------------------------------------- WARNING: No preset parameters were found for the device that Open MPI detected: Local host: vcn04 Device name: mlx5_0 Device vendor ID: 0x02c9 Device vendor part ID: 4114 Default device parameters will be used, which may result in lower performance. You can edit any of the files specified by the btl_openib_device_param_files MCA parameter to set values for your device. NOTE: You can turn off this warning by setting the MCA parameter btl_openib_warn_no_device_params_found to 0. -------------------------------------------------------------------------- [vcn03][[33640,1],0][connect/btl_openib_connect_udcm.c:1235:udcm_rc_qp_to_rtr] error modifing QP to RTR errno says Invalid argument [vcn04][[33640,1],1][connect/btl_openib_connect_udcm.c:1235:udcm_rc_qp_to_rtr] error modifing QP to RTR errno says Invalid argument mlx5: vcn04: got completion with error: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 78006802 0a00016f 00005bd2 [vcn04][[33640,1],1][connect/btl_openib_connect_udcm.c:1575:udcm_wait_for_send_completion] send failed with verbs status 2 [vcn04:28705] *** An error occurred in MPI_Send [vcn04:28705] *** reported by process [2204631041,1] [vcn04:28705] *** on communicator MPI_COMM_WORLD [vcn04:28705] *** MPI_ERR_OTHER: known error not in list [vcn04:28705] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, [vcn04:28705] *** and potentially your MPI job) [vcn03:05349] 1 more process has sent help message help-mpi-btl-openib.txt / no device params found [vcn03:05349] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages [root@vcn03 C]# ________________________________________ From: devel [[email protected]] on behalf of Pharthiphan Asokan [[email protected]] Sent: Tuesday, March 13, 2018 9:13 PM To: Open MPI Developers Subject: Re: [OMPI devel] How to Build OpenMPI to support FDR over SR-IOV [This sender failed our fraud detection checks and may not be who they appear to be. Learn about spoofing at http://aka.ms/LearnAboutSpoofing] HI Jeff, by adding PATH and LD_LIBRARY_PATH, I don't see orted not found issue. [root@vcn03 pasokan]# mpirun --allow-run-as-root -np 4 -host vcn03,vcn03,vcn04,vcn04 /mnt/lustre_client/pasokan/a.out -------------------------------------------------------------------------- WARNING: No preset parameters were found for the device that Open MPI detected: Local host: vcn03 Device name: mlx5_0 Device vendor ID: 0x02c9 Device vendor part ID: 4114 Default device parameters will be used, which may result in lower performance. You can edit any of the files specified by the btl_openib_device_param_files MCA parameter to set values for your device. NOTE: You can turn off this warning by setting the MCA parameter btl_openib_warn_no_device_params_found to 0. -------------------------------------------------------------------------- [vcn04][[33859,1],2][connect/btl_openib_connect_udcm.c:1235:udcm_rc_qp_to_rtr] error modifing QP to RTR errno says Invalid argument [vcn03][[33859,1],0][connect/btl_openib_connect_udcm.c:1235:udcm_rc_qp_to_rtr] error modifing QP to RTR errno says Invalid argument [vcn03][[33859,1],1][connect/btl_openib_connect_udcm.c:1235:udcm_rc_qp_to_rtr] error modifing QP to RTR errno says Invalid argument [vcn04][[33859,1],3][connect/btl_openib_connect_udcm.c:1235:udcm_rc_qp_to_rtr] error modifing QP to RTR errno says Invalid argument Hello world from processor vcn03, rank 0 out of 4 processors Hello world from processor vcn03, rank 1 out of 4 processors Hello world from processor vcn04, rank 2 out of 4 processors Hello world from processor vcn04, rank 3 out of 4 processors [vcn03:05070] 3 more processes have sent help message help-mpi-btl-openib.txt / no device params found [vcn03:05070] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages [root@vcn03 pasokan]# but still IOR isn't running while compiled using OpenMPI, throwing segmentation fault, which used to be very straight forward in Baremetal but not in KVM + SR-IOV ________________________________________ From: Pharthiphan Asokan Sent: Tuesday, March 13, 2018 8:42 PM To: Open MPI Developers Subject: RE: [OMPI devel] How to Build OpenMPI to support FDR over SR-IOV Thanks Jeff, OpenMPI is installed here [root@vcn03 C]# cd /mnt/lustre_client/pasokan/openmpi-3.0.0/openmpi-3.0.0/ bin/ etc/ include/ lib/ share/ [root@vcn03 C]# why exporting these variables not taking effect export PATH=/mnt/lustre_client/pasokan/openmpi-3.0.0/openmpi-3.0.0/bin:$PATH export LD_LIBRARY_PATH=/mnt/lustre_client/pasokan/openmpi-3.0.0/openmpi-3.0.0/lib:$LD_LIBRARY_PATH export INCLUDE=/mnt/lustre_client/pasokan/openmpi-3.0.0/openmpi-3.0.0/include:$INCLUDE but as said by providing --prefix /mnt/lustre_client/pasokan/openmpi-3.0.0/openmpi-3.0.0/ is working [root@vcn03 C]# mpirun --prefix /mnt/lustre_client/pasokan/openmpi-3.0.0/openmpi-3.0.0/ --allow-run-as-root -np 2 -host vcn03,vcn04 hostname vcn04 vcn03 [root@vcn03 C]# though my issue is IOR isn't running while compile with OpenMPI on SR-IOV envirorment [root@vcn03 C]# pwd /mnt/lustre_client/pasokan/IOR-July12/src/C [root@vcn03 C]# [root@vcn03 C]# export PATH=/mnt/lustre_client/pasokan/openmpi-3.0.0/openmpi-3.0.0/bin:$PATH [root@vcn03 C]# export LD_LIBRARY_PATH=/mnt/lustre_client/pasokan/openmpi-3.0.0/openmpi-3.0.0/lib:$LD_LIBRARY_PATH [root@vcn03 C]# export INCLUDE=/mnt/lustre_client/pasokan/openmpi-3.0.0/openmpi-3.0.0/include:$INCLUDE [root@vcn03 C]# [root@vcn03 C]# gmake posix mpiio mpicc -o IOR IOR.o utilities.o parse_options.o \ aiori-POSIX.o aiori-noMPIIO.o aiori-noHDF5.o aiori-noNCMPI.o \ -lm mpicc -o IOR IOR.o utilities.o parse_options.o \ aiori-POSIX.o aiori-MPIIO.o aiori-noHDF5.o aiori-noNCMPI.o \ -lm [root@vcn03 C]# ./IOR -------------------------------------------------------------------------- WARNING: No preset parameters were found for the device that Open MPI detected: Local host: vcn03 Device name: mlx5_0 Device vendor ID: 0x02c9 Device vendor part ID: 4114 Default device parameters will be used, which may result in lower performance. You can edit any of the files specified by the btl_openib_device_param_files MCA parameter to set values for your device. NOTE: You can turn off this warning by setting the MCA parameter btl_openib_warn_no_device_params_found to 0. -------------------------------------------------------------------------- [vcn03][[34068,1],0][connect/btl_openib_connect_udcm.c:1235:udcm_rc_qp_to_rtr] error modifing QP to RTR errno says Invalid argument Segmentation fault [root@vcn03 C]# Please help ! ________________________________________ From: devel [[email protected]] on behalf of Jeff Squyres (jsquyres) [[email protected]] Sent: Tuesday, March 13, 2018 8:20 PM To: Open MPI Developers List Subject: Re: [OMPI devel] How to Build OpenMPI to support FDR over SR-IOV On Mar 13, 2018, at 2:08 AM, Pharthiphan Asokan <[email protected]> wrote: > > [root@vcn03 C]# mpirun --allow-run-as-root -np 2 -host vcn03,vcn04 hostname > bash: orted: command not found This is the key ^^ These FAQ items may help: * https://www.open-mpi.org/faq/?category=running#run-prereqs. * https://www.open-mpi.org/faq/?category=running#adding-ompi-to-path * https://www.open-mpi.org/faq/?category=running#mpirun-prefix -- Jeff Squyres [email protected] _______________________________________________ devel mailing list [email protected] https://lists.open-mpi.org/mailman/listinfo/devel _______________________________________________ devel mailing list [email protected] https://lists.open-mpi.org/mailman/listinfo/devel
_______________________________________________ devel mailing list [email protected] https://lists.open-mpi.org/mailman/listinfo/devel
