Actually there is missing a check of the return code here
https://bitbucket.org/fenics-project/dolfin/src/dd945d70e9a7c8548b4cb88fe8bdb2abe2198b29/dolfin/nls/PETScTAOSolver.cpp?at=master&fileviewer=file-view-default#PETScTAOSolver.cpp-263

Jan


On Thu, 5 Nov 2015 16:14:09 +0100
Jan Blechta <[email protected]> wrote:

> I can reproduce it in step 6683 on 3 processes but I have no idea why
> this happens. Unfortunately I don't currently have PETSc with
> debugging so it is hard to investigate.
> 
> Backtrace on one of processes:
> =================================================================================
> Breakpoint 1, 0x00007fffecea4830 in PetscError ()
>    from /home/jan/dev/hashstack/fenics-deps.host-debian/lib/libpetsc.so.3.6
> (gdb) bt
> #0  0x00007fffecea4830 in PetscError ()
>    from /home/jan/dev/hashstack/fenics-deps.host-debian/lib/libpetsc.so.3.6
> #1  0x00007fffecfa0989 in VecAssemblyBegin_MPI ()
>    from /home/jan/dev/hashstack/fenics-deps.host-debian/lib/libpetsc.so.3.6
> #2  0x00007fffecf7b8f7 in VecAssemblyBegin ()
>    from /home/jan/dev/hashstack/fenics-deps.host-debian/lib/libpetsc.so.3.6
> #3  0x00007fffeedf9631 in dolfin::PETScVector::apply (
>     this=this@entry=0x1e8e6a0, mode="insert")
>     at ../../dolfin/la/PETScVector.cpp:319
> #4  0x00007fffeedf92b3 in dolfin::PETScVector::zero (this=0x1e8e6a0)
>     at ../../dolfin/la/PETScVector.cpp:342
> #5  0x00007fffeec0de31 in dolfin::PETScTAOSolver::solve (
>     this=this@entry=0x2173000, optimisation_problem=..., x=...,
> lb=..., ub=...) at ../../dolfin/nls/PETScTAOSolver.cpp:266
> #6  0x00007fffeec0edae in dolfin::PETScTAOSolver::solve (
>     this=this@entry=0x2173000, optimisation_problem=..., x=...)
>     at ../../dolfin/nls/PETScTAOSolver.cpp:177
> #7  0x00007fffd8d157f2 in _wrap_PETScTAOSolver_solve__SWIG_1 (
>     swig_obj=0x7fffffffc210, nobjs=3) at modulePYTHON_wrap.cxx:41488
> #8  _wrap_PETScTAOSolver_solve (self=<optimized out>, args=<optimized
> out>) at modulePYTHON_wrap.cxx:41521
> #9  0x00000000004d2017 in PyEval_EvalFrameEx ()
> #10 0x00000000004cb6b1 in PyEval_EvalCodeEx ()
> =================================================================================
> 
> and other processes:
> =================================================================================
> Program received signal SIGINT, Interrupt.
> 0x00007ffff78dba77 in sched_yield ()
> at ../sysdeps/unix/syscall-template.S:81
> 81      ../sysdeps/unix/syscall-template.S: No such file or directory.
> (gdb) bt #0  0x00007ffff78dba77 in sched_yield ()
>     at ../sysdeps/unix/syscall-template.S:81
> #1  0x00007fffecb1307d in opal_progress () from /usr/lib/libmpi.so.1
> #2  0x00007fffeca58e44 in ompi_request_default_wait_all ()
>    from /usr/lib/libmpi.so.1
> #3  0x00007fffd797dab2 in
> ompi_coll_tuned_allreduce_intra_recursivedoubling ()
> from /usr/lib/openmpi/lib/openmpi/mca_coll_tuned.so #4
> 0x00007fffeca6542b in PMPI_Allreduce () from /usr/lib/libmpi.so.1 #5
> 0x00007fffecf97f6a in VecDot_MPI ()
> from /home/jan/dev/hashstack/fenics-deps.host-debian/lib/libpetsc.so.3.6
> #6  0x00007fffecf7e8b1 in VecDot ()
> from /home/jan/dev/hashstack/fenics-deps.host-debian/lib/libpetsc.so.3.6
> #7  0x00007fffed70d9b1 in TaoSolve_CG ()
> from /home/jan/dev/hashstack/fenics-deps.host-debian/lib/libpetsc.so.3.6
> #8  0x00007fffed6f0847 in TaoSolve ()
> from /home/jan/dev/hashstack/fenics-deps.host-debian/lib/libpetsc.so.3.6
> #9  0x00007fffeec0de25 in dolfin::PETScTAOSolver::solve
> ( this=this@entry=0x1fa1a40, optimisation_problem=..., x=..., lb=...,
> ub=...) at ../../dolfin/nls/PETScTAOSolver.cpp:263 #10
> 0x00007fffeec0edae in dolfin::PETScTAOSolver::solve
> ( this=this@entry=0x1fa1a40, optimisation_problem=..., x=...)
> at ../../dolfin/nls/PETScTAOSolver.cpp:177 #11 0x00007fffd8d157f2 in
> _wrap_PETScTAOSolver_solve__SWIG_1 (
> =================================================================================
> 
> Jan
> 
> 
> On Thu, 5 Nov 2015 16:30:50 +0200
> Giorgos Grekas <[email protected]> wrote:
> 
> > Hello again,
> > 
> > i would like to ask for the bug which has reported in this mail is
> > it scheduled to be fixed in the following months?
> > 
> > Thank you in advance and for your great support.
> > 
> > On Mon, Oct 12, 2015 at 6:42 PM, Jan Blechta
> > <[email protected]> wrote:
> > 
> > > On Mon, 12 Oct 2015 17:15:18 +0300
> > > Giorgos Grekas <[email protected]> wrote:
> > >
> > > > I provide backtrace to the file bt.txt and my code. For my code
> > > > you need to run the file runMe.py.
> > >
> > > This code fails with assertion in mshr:
> > >
> > > *** Error:   Unable to complete call to function
> > > add_simple_polygon(). *** Reason:  Assertion !i.second failed.
> > > *** Where:   This error was encountered
> > > inside ../src/CSGCGALDomain2D.cpp (line 488).
> > >
> > > This seems like a trivial bug. Could you fix it Benjamin?
> > >
> > > Jan
> > >
> > > >
> > > >
> > > > On Mon, Oct 12, 2015 at 4:40 PM, Jan Blechta
> > > > <[email protected]> wrote:
> > > >
> > > > > PETSc error code 1 does not seem to indicate an expected
> > > > > problem,
> > > > > http://www.mcs.anl.gov/petsc/petsc-dev/include/petscerror.h.html.
> > > > > It seems as an error not handled by PETSc.
> > > > >
> > > > > You could provide us with your code or try investigating the
> > > > > problem with debugger
> > > > >
> > > > >   $ mpirun -n 3 xterm -e gdb -ex 'set breakpoint pending on'
> > > > > -ex 'break PetscError' -ex 'break dolfin::dolfin_error' -ex r
> > > > > -args python your_script.py
> > > > >   ...
> > > > >   Break point hit...
> > > > >   (gdb) bt
> > > > >
> > > > > and post a backtrace here.
> > > > >
> > > > > Jan
> > > > >
> > > > >
> > > > > On Mon, 12 Oct 2015 15:16:48 +0300
> > > > > Giorgos Grekas <[email protected]> wrote:
> > > > >
> > > > > > Hello,
> > > > > > i am using ncg from tao solver and i wanted to test my code
> > > > > > validity in a pc  with 4 processors
> > > > > > before its execution in a cluster. When i run my code with 2
> > > > > > processes (mpirun -np 2) everything
> > > > > > looks to work fine but when i use 3 or more processes i have
> > > > > > the following error:
> > > > > >
> > > > > >
> > > > > >  Error:   Unable to successfully call PETSc function
> > > > > > 'VecAssemblyBegin'. *** Reason:  PETSc error code is: 1.
> > > > > > *** Where:   This error was encountered inside
> > > > > >
> > > > >
> > > /home/ggrekas/.hashdist/tmp/dolfin-wphma2jn5fuw/dolfin/la/PETScVector.cpp.
> > > > > > *** Process: 3
> > > > > > ***
> > > > > > *** DOLFIN version: 1.7.0dev
> > > > > > *** Git changeset:  3fbd47ec249a3e4bd9d055f8a01b28287c5bcf6a
> > > > > > ***
> > > > > >
> > > -------------------------------------------------------------------------
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > ===================================================================================
> > > > > > =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> > > > > > =   EXIT CODE: 134
> > > > > > =   CLEANING UP REMAINING PROCESSES
> > > > > > =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
> > > > > >
> > > > >
> > > ===================================================================================
> > > > > > YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Aborted
> > > > > > (signal 6) This typically refers to a problem with your
> > > > > > application. Please see the FAQ page for debugging
> > > > > > suggestions
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > So, is it an issue that i must report to the tao team?
> > > > > >
> > > > > > Thank you in advance.
> > > > >
> > > > >
> > >
> > >
> 
> _______________________________________________
> fenics-support mailing list
> [email protected]
> http://fenicsproject.org/mailman/listinfo/fenics-support

_______________________________________________
fenics-support mailing list
[email protected]
http://fenicsproject.org/mailman/listinfo/fenics-support

Reply via email to