The file system I am mounting via NFS is an ordinary Linux file system, it 
is not a HPC parallel filesystems like Lustre or anything like them.

I tried commenting out the call to check-pointing as you suggested and was 
able to run the code on 4 node (each with 4 cores) and it finished very 
quickly.

My mpirun command line looks like this

mpirun --host pc1,pc2,pc3,pc4 --mca btl_tcp_if_include 192.168.0.0/24 --mca 
btl tcp,self 
/nfs/systems/dealii/head-bost_1_70_0/examples/step-69/step-69.release

It is unlikely that I will have the resource to spin up a Lustre like 
parallel filesystems, do you have additional suggestion that may allow me 
to enable check-pointing ?

Cheers
On Friday, 3 September 2021 at 13:18:32 UTC-7 Matthias Maier wrote:

> Hi Nicholas,
>
> On Fri, Sep 3, 2021, at 12:49 CDT, Nicholas Yue <[email protected]> 
> wrote:
>
> > Hi
> >
> > It seems to be consistently failing when writing the checkpoint file(s)
> >
> > Are there special flags I need to setup up for some form of parallel IO 
> > that may be happening ?
>
> > [...]
>
> > Additional information: 
> > deal.II encountered an error while calling an MPI function.
> > The description of the error provided by MPI is "MPI_ERR_FILE: invalid
> > file".
> > The numerical value of the original error code is 30.
>
> This is interesting. It seems that MPI IO is failing.
>
> Do you write into a distributed file system that is replicated among nodes?
>
> Would you mind testing running the code with checkpointing disabled,
> something like:
>
>
> diff --git a/examples/step-69/step-69.cc b/examples/step-69/step-69.cc
> index 4a801f97ba..4b7c9a2f63 100644
> --- a/examples/step-69/step-69.cc
> +++ b/examples/step-69/step-69.cc
> @@ -2595,7 +2595,7 @@ namespace Step69
>
> if (t > output_cycle * output_granularity)
> {
> - checkpoint(U, base_name, t, output_cycle);
> + // checkpoint(U, base_name, t, output_cycle);
> output(U, base_name, t, output_cycle);
> ++output_cycle;
> }
>
>
> I am interested in seeing whether the solution output (into vtu) works.
>
>
> Best,
> Matthias
>

-- 
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see 
https://groups.google.com/d/forum/dealii?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/dealii/eb260e43-8c80-49e4-8927-070fbbb106f0n%40googlegroups.com.

Reply via email to