The file system I am mounting via NFS is an ordinary Linux file system, it is not a HPC parallel filesystems like Lustre or anything like them.
I tried commenting out the call to check-pointing as you suggested and was able to run the code on 4 node (each with 4 cores) and it finished very quickly. My mpirun command line looks like this mpirun --host pc1,pc2,pc3,pc4 --mca btl_tcp_if_include 192.168.0.0/24 --mca btl tcp,self /nfs/systems/dealii/head-bost_1_70_0/examples/step-69/step-69.release It is unlikely that I will have the resource to spin up a Lustre like parallel filesystems, do you have additional suggestion that may allow me to enable check-pointing ? Cheers On Friday, 3 September 2021 at 13:18:32 UTC-7 Matthias Maier wrote: > Hi Nicholas, > > On Fri, Sep 3, 2021, at 12:49 CDT, Nicholas Yue <[email protected]> > wrote: > > > Hi > > > > It seems to be consistently failing when writing the checkpoint file(s) > > > > Are there special flags I need to setup up for some form of parallel IO > > that may be happening ? > > > [...] > > > Additional information: > > deal.II encountered an error while calling an MPI function. > > The description of the error provided by MPI is "MPI_ERR_FILE: invalid > > file". > > The numerical value of the original error code is 30. > > This is interesting. It seems that MPI IO is failing. > > Do you write into a distributed file system that is replicated among nodes? > > Would you mind testing running the code with checkpointing disabled, > something like: > > > diff --git a/examples/step-69/step-69.cc b/examples/step-69/step-69.cc > index 4a801f97ba..4b7c9a2f63 100644 > --- a/examples/step-69/step-69.cc > +++ b/examples/step-69/step-69.cc > @@ -2595,7 +2595,7 @@ namespace Step69 > > if (t > output_cycle * output_granularity) > { > - checkpoint(U, base_name, t, output_cycle); > + // checkpoint(U, base_name, t, output_cycle); > output(U, base_name, t, output_cycle); > ++output_cycle; > } > > > I am interested in seeing whether the solution output (into vtu) works. > > > Best, > Matthias > -- The deal.II project is located at http://www.dealii.org/ For mailing list/forum options, see https://groups.google.com/d/forum/dealii?hl=en --- You received this message because you are subscribed to the Google Groups "deal.II User Group" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/dealii/eb260e43-8c80-49e4-8927-070fbbb106f0n%40googlegroups.com.
