I observed that Boomeramg eventually fails when running on 3,5,6 or 7
processes. When using 1,2,4,8 processes, it is ok. Strange enough is
that nobody saw it but me as I can reproduce it very easily
  $np=3 # or 5,6,7
  $export DOLFIN_NOPLOT=1
  $mpirun -n $np demo_navier-stokes
with FEniCS 1.0.0, PETSc 3.2 and with FEniCS dev, PETSc 3.4. After few
timesteps PETSc fails and DOLFIN deadlocks.

PETSc throws in this demo when solving projection step, i.e. Poisson
problem, with both Dirichlet and zero Neumann condition, discretized by
piecewise-linears on triangles.

Regarding effort to reproduce it with PETSc directly, Jed, I was able
to dump this specific matrix to binary format but not vector, so I need
to obtain somehow binary vector - is somewhere documentation of that
binary format?

I guess I would need to recompile PETSc in some debug mode to break
into Hypre, is it so? This is backtrace from process printing PETSc ERROR:
__________________________________________________________________________
#0  0x00007ffff5caa2d8 in __GI___poll (fds=0x6d02c0, nfds=6, 
    timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:83
#1  0x00007fffed0c5ab0 in ?? () from /usr/lib/libopen-pal.so.0
#2  0x00007fffed0c48ff in ?? () from /usr/lib/libopen-pal.so.0
#3  0x00007fffed0b9221 in opal_progress ()
from /usr/lib/libopen-pal.so.0
#4  0x00007ffff1b593d5 in ?? () from /usr/lib/libmpi.so.0
#5  0x00007ffff1b8a1c5 in PMPI_Waitany () from /usr/lib/libmpi.so.0
#6  0x00007ffff2f5c43e in VecScatterEnd_1 ()
   from /usr/local/pkg/petsc/3.4.0/gnu/lib/libpetsc.so
#7  0x00007ffff2f57811 in VecScatterEnd ()
   from /usr/local/pkg/petsc/3.4.0/gnu/lib/libpetsc.so
#8  0x00007ffff2f3cb9a in VecGhostUpdateEnd ()
   from /usr/local/pkg/petsc/3.4.0/gnu/lib/libpetsc.so
#9  0x00007ffff74ecdea in dolfin::Assembler::assemble
(this=0x7fffffff9da0, 
    A=..., a=...)
    at /usr/users/blechta/fenics/fenics/src/dolfin/dolfin/fem/Assembler.cpp:96
#10 0x00007ffff74e8095 in dolfin::assemble (A=..., a=...)
    at /usr/users/blechta/fenics/fenics/src/dolfin/dolfin/fem/assemble.cpp:38
#11 0x0000000000425d41 in main ()
    at 
/usr/users/blechta/fenics/fenics/src/dolfin/demo/pde/navier-stokes/cpp/main.cpp:180
_________________________________________________________________________________________


This is backtrace from one deadlocked process:
______________________________________________________________________
#0  0x00007ffff5caa2d8 in __GI___poll (fds=0x6d02c0, nfds=6, 
    timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:83
#1  0x00007fffed0c5ab0 in ?? () from /usr/lib/libopen-pal.so.0
#2  0x00007fffed0c48ff in ?? () from /usr/lib/libopen-pal.so.0
#3  0x00007fffed0b9221 in opal_progress ()
from /usr/lib/libopen-pal.so.0 #4  0x00007fffdb131a1d in ?? ()
   from /usr/lib/openmpi/lib/openmpi/mca_pml_ob1.so
#5  0x00007fffd9220db9 in ?? ()
   from /usr/lib/openmpi/lib/openmpi/mca_coll_tuned.so
#6  0x00007ffff1b6dee9 in PMPI_Allreduce () from /usr/lib/libmpi.so.0
#7  0x00007ffff2e7aa74 in PetscSplitOwnership ()
   from /usr/local/pkg/petsc/3.4.0/gnu/lib/libpetsc.so
#8  0x00007ffff2eee129 in PetscLayoutSetUp ()
   from /usr/local/pkg/petsc/3.4.0/gnu/lib/libpetsc.so
#9  0x00007ffff2f31cf7 in VecCreate_MPI_Private ()
   from /usr/local/pkg/petsc/3.4.0/gnu/lib/libpetsc.so
#10 0x00007ffff2f32092 in VecCreate_MPI ()
   from /usr/local/pkg/petsc/3.4.0/gnu/lib/libpetsc.so
#11 0x00007ffff2f234f7 in VecSetType ()
   from /usr/local/pkg/petsc/3.4.0/gnu/lib/libpetsc.so
#12 0x00007ffff2f32708 in VecCreate_Standard ()
   from /usr/local/pkg/petsc/3.4.0/gnu/lib/libpetsc.so
#13 0x00007ffff2f234f7 in VecSetType ()
   from /usr/local/pkg/petsc/3.4.0/gnu/lib/libpetsc.so
---Type <return> to continue, or q <return> to quit---
#14 0x00007ffff2fb75a1 in MatGetVecs ()
   from /usr/local/pkg/petsc/3.4.0/gnu/lib/libpetsc.so
#15 0x00007ffff335fdc6 in PCSetUp_HYPRE ()
   from /usr/local/pkg/petsc/3.4.0/gnu/lib/libpetsc.so
#16 0x00007ffff3362cd6 in PCSetUp ()
   from /usr/local/pkg/petsc/3.4.0/gnu/lib/libpetsc.so
#17 0x00007ffff33f676e in KSPSetUp ()
   from /usr/local/pkg/petsc/3.4.0/gnu/lib/libpetsc.so
#18 0x00007ffff33f7bfe in KSPSolve ()
   from /usr/local/pkg/petsc/3.4.0/gnu/lib/libpetsc.so
#19 0x00007ffff77082f4 in dolfin::PETScKrylovSolver::solve
(this=0x9700f0, x= ..., b=...)
    at 
/usr/users/blechta/fenics/fenics/src/dolfin/dolfin/la/PETScKrylovSolver.cpp:445
#20 0x00007ffff7709228 in dolfin::PETScKrylovSolver::solve
(this=0x9700f0, A=..., x=..., b=...)
    at 
/usr/users/blechta/fenics/fenics/src/dolfin/dolfin/la/PETScKrylovSolver.cpp:491
#21 0x00007ffff76d9303 in dolfin::KrylovSolver::solve (this=0x94a8e0,
A=..., x=..., b=...)
    at 
/usr/users/blechta/fenics/fenics/src/dolfin/dolfin/la/KrylovSolver.cpp:147
#22 0x00007ffff76f4b91 in dolfin::LinearSolver::solve
(this=0x7fffffff9c40, A=..., x=..., b=...)
_____________________________________________________________________________________


On Wed, 29 May 2013 11:19:53 -0500
Jed Brown <[email protected]> wrote:
> Jan Blechta <[email protected]> writes:
> 
> > Maybe this is PETSc stack from previous time step - this is
> > provided by DOLFIN.
> >
> >> Maybe you aren't checking error codes and try to do something else
> >> collective?
> >
> > I don't know, I'm just using FEniCS.
> 
> When I said "you", I was addressing the list in general, which
> includes FEniCS developers.
> 
> >> > [2]PETSC ERROR: PCDestroy() line 121
> >> > in /petsc-3.4.0/src/ksp/pc/interface/precon.c [2]PETSC ERROR:
> >> > KSPDestroy() line 788
> >> > in /petsc-3.4.0/src/ksp/ksp/interface/itfunc.c
> >> >
> >> > and deadlocks. Did you seen it before? Where can be the problem?
> >> 
> >> Deadlock must be back in your code.  This error occurs on
> >> PETSC_COMM_SELF, which means we have no way to ensure that the
> >> error condition is collective.  You can't just go calling other
> >> collective functions after such an error.
> >
> > This means that DOLFIN handles poorly some error condition.
> 
> It appears that way, but that appears to be independent of whatever
> causes Hypre to return an error.
> 
> >> Anyway, please set up a reproducible test case and/or get a trace
> >> from inside Hypre.  It will be useful for them to debug the
> >> problem.
> >
> > I'm not PETSc user so it would be quite time-consuming for me to
> > try to reproduce it without FEniCS. I will try at least get a trace.
> 
> You can try dumping the matrix using '-ksp_view_mat binary' (writes
> 'binaryoutput'), for example, then try solving it using a PETSc
> example, e.g. src/ksp/ksp/examples/tutorials/ex10.c with the same
> configuration via run-time options.
> _______________________________________________
> fenics mailing list
> [email protected]
> http://fenicsproject.org/mailman/listinfo/fenics

_______________________________________________
fenics mailing list
[email protected]
http://fenicsproject.org/mailman/listinfo/fenics

Reply via email to