Re: [FEniCS-support] mpi and tao solver

Jan Blechta Thu, 05 Nov 2015 07:15:23 -0800

I can reproduce it in step 6683 on 3 processes but I have no idea why
this happens. Unfortunately I don't currently have PETSc with debugging
so it is hard to investigate.


Backtrace on one of processes:
=================================================================================
Breakpoint 1, 0x00007fffecea4830 in PetscError ()
   from /home/jan/dev/hashstack/fenics-deps.host-debian/lib/libpetsc.so.3.6
(gdb) bt
#0  0x00007fffecea4830 in PetscError ()
   from /home/jan/dev/hashstack/fenics-deps.host-debian/lib/libpetsc.so.3.6
#1  0x00007fffecfa0989 in VecAssemblyBegin_MPI ()
   from /home/jan/dev/hashstack/fenics-deps.host-debian/lib/libpetsc.so.3.6
#2  0x00007fffecf7b8f7 in VecAssemblyBegin ()
   from /home/jan/dev/hashstack/fenics-deps.host-debian/lib/libpetsc.so.3.6
#3  0x00007fffeedf9631 in dolfin::PETScVector::apply (
    this=this@entry=0x1e8e6a0, mode="insert")
    at ../../dolfin/la/PETScVector.cpp:319
#4  0x00007fffeedf92b3 in dolfin::PETScVector::zero (this=0x1e8e6a0)
    at ../../dolfin/la/PETScVector.cpp:342
#5  0x00007fffeec0de31 in dolfin::PETScTAOSolver::solve (
    this=this@entry=0x2173000, optimisation_problem=..., x=..., lb=...,
ub=...) at ../../dolfin/nls/PETScTAOSolver.cpp:266
#6  0x00007fffeec0edae in dolfin::PETScTAOSolver::solve (
    this=this@entry=0x2173000, optimisation_problem=..., x=...)
    at ../../dolfin/nls/PETScTAOSolver.cpp:177
#7  0x00007fffd8d157f2 in _wrap_PETScTAOSolver_solve__SWIG_1 (
    swig_obj=0x7fffffffc210, nobjs=3) at modulePYTHON_wrap.cxx:41488
#8  _wrap_PETScTAOSolver_solve (self=<optimized out>, args=<optimized
out>) at modulePYTHON_wrap.cxx:41521
#9  0x00000000004d2017 in PyEval_EvalFrameEx ()
#10 0x00000000004cb6b1 in PyEval_EvalCodeEx ()
=================================================================================

and other processes:
=================================================================================
Program received signal SIGINT, Interrupt.
0x00007ffff78dba77 in sched_yield ()
at ../sysdeps/unix/syscall-template.S:81
81      ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) bt #0  0x00007ffff78dba77 in sched_yield ()
    at ../sysdeps/unix/syscall-template.S:81
#1  0x00007fffecb1307d in opal_progress () from /usr/lib/libmpi.so.1
#2  0x00007fffeca58e44 in ompi_request_default_wait_all ()
   from /usr/lib/libmpi.so.1
#3  0x00007fffd797dab2 in
ompi_coll_tuned_allreduce_intra_recursivedoubling ()
from /usr/lib/openmpi/lib/openmpi/mca_coll_tuned.so #4
0x00007fffeca6542b in PMPI_Allreduce () from /usr/lib/libmpi.so.1 #5
0x00007fffecf97f6a in VecDot_MPI ()
from /home/jan/dev/hashstack/fenics-deps.host-debian/lib/libpetsc.so.3.6
#6  0x00007fffecf7e8b1 in VecDot ()
from /home/jan/dev/hashstack/fenics-deps.host-debian/lib/libpetsc.so.3.6
#7  0x00007fffed70d9b1 in TaoSolve_CG ()
from /home/jan/dev/hashstack/fenics-deps.host-debian/lib/libpetsc.so.3.6
#8  0x00007fffed6f0847 in TaoSolve ()
from /home/jan/dev/hashstack/fenics-deps.host-debian/lib/libpetsc.so.3.6
#9  0x00007fffeec0de25 in dolfin::PETScTAOSolver::solve
( this=this@entry=0x1fa1a40, optimisation_problem=..., x=..., lb=...,
ub=...) at ../../dolfin/nls/PETScTAOSolver.cpp:263 #10
0x00007fffeec0edae in dolfin::PETScTAOSolver::solve
( this=this@entry=0x1fa1a40, optimisation_problem=..., x=...)
at ../../dolfin/nls/PETScTAOSolver.cpp:177 #11 0x00007fffd8d157f2 in
_wrap_PETScTAOSolver_solve__SWIG_1 (
=================================================================================

Jan


On Thu, 5 Nov 2015 16:30:50 +0200
Giorgos Grekas <[email protected]> wrote:

> Hello again,
> 
> i would like to ask for the bug which has reported in this mail is it
> scheduled to be fixed in the following months?
> 
> Thank you in advance and for your great support.
> 
> On Mon, Oct 12, 2015 at 6:42 PM, Jan Blechta
> <[email protected]> wrote:
> 
> > On Mon, 12 Oct 2015 17:15:18 +0300
> > Giorgos Grekas <[email protected]> wrote:
> >
> > > I provide backtrace to the file bt.txt and my code. For my code
> > > you need to run the file runMe.py.
> >
> > This code fails with assertion in mshr:
> >
> > *** Error:   Unable to complete call to function
> > add_simple_polygon(). *** Reason:  Assertion !i.second failed.
> > *** Where:   This error was encountered
> > inside ../src/CSGCGALDomain2D.cpp (line 488).
> >
> > This seems like a trivial bug. Could you fix it Benjamin?
> >
> > Jan
> >
> > >
> > >
> > > On Mon, Oct 12, 2015 at 4:40 PM, Jan Blechta
> > > <[email protected]> wrote:
> > >
> > > > PETSc error code 1 does not seem to indicate an expected
> > > > problem,
> > > > http://www.mcs.anl.gov/petsc/petsc-dev/include/petscerror.h.html.
> > > > It seems as an error not handled by PETSc.
> > > >
> > > > You could provide us with your code or try investigating the
> > > > problem with debugger
> > > >
> > > >   $ mpirun -n 3 xterm -e gdb -ex 'set breakpoint pending on' -ex
> > > > 'break PetscError' -ex 'break dolfin::dolfin_error' -ex r -args
> > > > python your_script.py
> > > >   ...
> > > >   Break point hit...
> > > >   (gdb) bt
> > > >
> > > > and post a backtrace here.
> > > >
> > > > Jan
> > > >
> > > >
> > > > On Mon, 12 Oct 2015 15:16:48 +0300
> > > > Giorgos Grekas <[email protected]> wrote:
> > > >
> > > > > Hello,
> > > > > i am using ncg from tao solver and i wanted to test my code
> > > > > validity in a pc  with 4 processors
> > > > > before its execution in a cluster. When i run my code with 2
> > > > > processes (mpirun -np 2) everything
> > > > > looks to work fine but when i use 3 or more processes i have
> > > > > the following error:
> > > > >
> > > > >
> > > > >  Error:   Unable to successfully call PETSc function
> > > > > 'VecAssemblyBegin'. *** Reason:  PETSc error code is: 1.
> > > > > *** Where:   This error was encountered inside
> > > > >
> > > >
> > /home/ggrekas/.hashdist/tmp/dolfin-wphma2jn5fuw/dolfin/la/PETScVector.cpp.
> > > > > *** Process: 3
> > > > > ***
> > > > > *** DOLFIN version: 1.7.0dev
> > > > > *** Git changeset:  3fbd47ec249a3e4bd9d055f8a01b28287c5bcf6a
> > > > > ***
> > > > >
> > -------------------------------------------------------------------------
> > > > >
> > > > >
> > > > >
> > > >
> > ===================================================================================
> > > > > =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> > > > > =   EXIT CODE: 134
> > > > > =   CLEANING UP REMAINING PROCESSES
> > > > > =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
> > > > >
> > > >
> > ===================================================================================
> > > > > YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Aborted
> > > > > (signal 6) This typically refers to a problem with your
> > > > > application. Please see the FAQ page for debugging suggestions
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > So, is it an issue that i must report to the tao team?
> > > > >
> > > > > Thank you in advance.
> > > >
> > > >
> >
> >

_______________________________________________
fenics-support mailing list
[email protected]
http://fenicsproject.org/mailman/listinfo/fenics-support

Re: [FEniCS-support] mpi and tao solver

Reply via email to