All, I installed ParaView/5.8.0-Python-3.8.2-mpi and OpenFOAM/8. This is with foss/2020a
Running mpi on a single node works fine, however with multiple node it crashes as follows: [node801:25722:0:25722] ib_mlx5_log.c:132 Transport retry count exceeded on mlx5_0:1/IB (synd 0x15 vend 0x81 hw_synd 0/0) [node801:25722:0:25722] ib_mlx5_log.c:132 DCI QP 0x3685 wqe[20]: SEND s-e [rqpn 0x12c88 rlid 1] [va 0x37fd600 len 6711 lkey 0x2c188] ==== backtrace (tid: 25722) ==== 0 0x00000000000214ae ucs_debug_print_backtrace() /home/e/easybuild/.local/easybuild/build/UCX/1.8.0/GCCcore-9.3.0/ucx-1.8.0/src/ucs/debug/debug.c:653 1 0x000000000001ff00 uct_ib_mlx5_completion_with_err() /home/e/easybuild/.local/easybuild/build/UCX/1.8.0/GCCcore-9.3.0/ucx-1.8.0/src/uct/ib/mlx5/ib_mlx5_log.c:132 2 0x000000000005982e uct_ib_mlx5_poll_cq() /home/e/easybuild/.local/easybuild/build/UCX/1.8.0/GCCcore-9.3.0/ucx-1.8.0/src/uct/ib/mlx5/ib_mlx5.inl:81 3 0x000000000005982e uct_dc_mlx5_iface_progress() /home/e/easybuild/.local/easybuild/build/UCX/1.8.0/GCCcore-9.3.0/ucx-1.8.0/src/uct/ib/dc/dc_mlx5.c:238 4 0x0000000000027bba ucs_callbackq_dispatch() /home/e/easybuild/.local/easybuild/build/UCX/1.8.0/GCCcore-9.3.0/ucx-1.8.0/src/ucs/datastruct/callbackq.h:211 5 0x0000000000027bba uct_worker_progress() /home/e/easybuild/.local/easybuild/build/UCX/1.8.0/GCCcore-9.3.0/ucx-1.8.0/src/uct/api/uct.h:2221 6 0x0000000000027bba ucp_worker_progress() /home/e/easybuild/.local/easybuild/build/UCX/1.8.0/GCCcore-9.3.0/ucx-1.8.0/src/ucp/core/ucp_worker.c:1951 7 0x0000000000003c27 mca_pml_ucx_progress() ???:0 8 0x000000000002da7b opal_progress() ???:0 9 0x00000000000550b5 ompi_request_default_wait() ???:0 The IB card on these nodes are: Infiniband controller: Mellanox Technologies MT28908 Family [ConnectX-6] Please note the following: 1) This error is not occurring with paraview binary package which uses an MPICH mpi installed with the package 2) On nodes with a ConnectX-5 IB card it works fine on multiple nodes Has anybody seen this before. <https://www.njit.edu/> Glenn (Gedaliah) Wolosh, Ph.D. Ass't Director Research Software and Cloud Computing Acad & Research Computing Systems [email protected] <mailto:[email protected]> • (973) 596-5437 <tel:(973) 596-5437> A Top 100 National University U.S. News & World Report

