Hi again, Sorry for the late reply. Believe it or not, the internet connection is terrible here.
The failure happens when the number of degree of freedoms is 96 million or higher. This happens every time I run the program. I have has this issue on another cluster but I managed to avoid it by using the so-called fat nodes which were providing 128GB of ram. Here all nodes have 64GB ram. The interesting thing is that if I used one core on each node then the program runs successfully but if I use more than one core, I see failure. I don't think this is due to any lose cable, as for MPI, I don't know how to test other settings. On Wednesday, November 9, 2016 at 9:29:09 PM UTC+3, Wolfgang Bangerth wrote: > > On 11/09/2016 08:36 AM, Ashkan Dorostkar wrote: > > > > [n49422:9059] *** An error occurred in MPI_Allreduce > > [n49422:9059] *** reported by process [3040346113,140733193388063] > > [n49422:9059] *** on communicator MPI_COMM_WORLD > > [n49422:9059] *** MPI_ERR_IN_STATUS: error code in status > > [n49422:9059] *** MPI_ERRORS_ARE_FATAL (processes in this communicator > > will now abort, > > [n49422:9059] *** and potentially your MPI job) > > > > This looks like a communication issue or a communication oversaturation. > > Does anyone have any experience with this? > > Ashkan, > we have had people report such issues before, on an intermittent basis. > It is of course not impossible that this points to an actual problem in > deal.II (or PETSc, or Trilinos), but it is difficult to know for sure > without a backtrace where this came from. > > Does it happen every time with the same program? If it doesn't, it is > also possible that this is a symptom of a lose cable or a wrong > configuration of the MPI system -- both things we have seen in cluster > in the past. > > Best > W. > > -- > ------------------------------------------------------------------------ > Wolfgang Bangerth email: [email protected] > www: http://www.math.colostate.edu/~bangerth/ > -- The deal.II project is located at http://www.dealii.org/ For mailing list/forum options, see https://groups.google.com/d/forum/dealii?hl=en --- You received this message because you are subscribed to the Google Groups "deal.II User Group" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
