On Fri, Jul 1, 2016 at 12:34 PM, 张江 <zhangjiang.d...@gmail.com> wrote:
> Hi,
>
> I am trying to read a large data (11.7GB) with libmesh and use it for my
> application. The program runs well when using just one process.
> But in parallel (mpirun -n 4), after executing a while, some errors came
> out:
>
> [0]PETSC ERROR:
> ------------------------------------------------------------------------
> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation,
> probably memory access out of range
> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
> [0]PETSC ERROR: or see
> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
> [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS
> X to find memory corruption errors
> [0]PETSC ERROR: likely location of problem given in stack below
> [0]PETSC ERROR: --------------------- Stack Frames
> ------------------------------------
> [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not
> available,
> [0]PETSC ERROR: INSTEAD the line number of the start of the function
> [0]PETSC ERROR: is given.
> [0]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> [0]PETSC ERROR: Signal received
> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for
> trouble shooting.
> [0]PETSC ERROR: Petsc Release Version 3.7.2, Jun, 05, 2016
> [0]PETSC ERROR: ./ptracer on a arch-linux2-c-debug named compute001 by
> jiangzhang Fri Jul 1 10:07:07 2016
> [0]PETSC ERROR: Configure options
> --prefix=/nfs/proj-tpeterka/jiang/opt/petsc-3.7.2 --download-fblaslapack
> --with-mpi-dir=/nfs/proj-tpeterka/jiang/libraries/mpich-3.2
> [0]PETSC ERROR: #1 User provided function() line 0 in unknown file
> application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0
>
> Anybody know the possible causes?
>
It's impossible to know the cause without a stack trace. Also you should
start off with a much smaller problem while you are debugging if possible.
It's a segfault, so it's usually caused by accessing past the end of an
array.
Since your code works in serial, it probably means you are accessing a
value in a parallel vector which does not exist on a certain processor.
--
John
------------------------------------------------------------------------------
Attend Shape: An AT&T Tech Expo July 15-16. Meet us at AT&T Park in San
Francisco, CA to explore cutting-edge tech and listen to tech luminaries
present their vision of the future. This family event has something for
everyone, including kids. Get more information and register today.
http://sdm.link/attshape
_______________________________________________
Libmesh-devel mailing list
Libmesh-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/libmesh-devel