Hello all, I am running a simulation of linear elasticity in the Lomonosov2 cluster (rank 41 in top 500) which has InfiniBand network. For problems larger than 90 million unknowns openmpi just aborts the program with this message
[n49422:9059] *** An error occurred in MPI_Allreduce [n49422:9059] *** reported by process [3040346113,140733193388063] [n49422:9059] *** on communicator MPI_COMM_WORLD [n49422:9059] *** MPI_ERR_IN_STATUS: error code in status [n49422:9059] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, [n49422:9059] *** and potentially your MPI job) This looks like a communication issue or a communication oversaturation. Does anyone have any experience with this? Best, Ashkan P.S I use the following flag to avoid this issue but it doesn't work either export OMPI_MCA_btl_openib_receive_queues="X,256,1024:X,65536,128" -- The deal.II project is located at http://www.dealii.org/ For mailing list/forum options, see https://groups.google.com/d/forum/dealii?hl=en --- You received this message because you are subscribed to the Google Groups "deal.II User Group" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
