On Fri, Jul 1, 2016 at 12:34 PM, 张江 <zhangjiang.d...@gmail.com> wrote:
> Hi, > > I am trying to read a large data (11.7GB) with libmesh and use it for my > application. The program runs well when using just one process. > But in parallel (mpirun -n 4), after executing a while, some errors came > out: > > [0]PETSC ERROR: > ------------------------------------------------------------------------ > [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, > probably memory access out of range > [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > [0]PETSC ERROR: or see > http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS > X to find memory corruption errors > [0]PETSC ERROR: likely location of problem given in stack below > [0]PETSC ERROR: --------------------- Stack Frames > ------------------------------------ > [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not > available, > [0]PETSC ERROR: INSTEAD the line number of the start of the function > [0]PETSC ERROR: is given. > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: Signal received > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for > trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.7.2, Jun, 05, 2016 > [0]PETSC ERROR: ./ptracer on a arch-linux2-c-debug named compute001 by > jiangzhang Fri Jul 1 10:07:07 2016 > [0]PETSC ERROR: Configure options > --prefix=/nfs/proj-tpeterka/jiang/opt/petsc-3.7.2 --download-fblaslapack > --with-mpi-dir=/nfs/proj-tpeterka/jiang/libraries/mpich-3.2 > [0]PETSC ERROR: #1 User provided function() line 0 in unknown file > application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0 > > Anybody know the possible causes? > It's impossible to know the cause without a stack trace. Also you should start off with a much smaller problem while you are debugging if possible. It's a segfault, so it's usually caused by accessing past the end of an array. Since your code works in serial, it probably means you are accessing a value in a parallel vector which does not exist on a certain processor. -- John ------------------------------------------------------------------------------ Attend Shape: An AT&T Tech Expo July 15-16. Meet us at AT&T Park in San Francisco, CA to explore cutting-edge tech and listen to tech luminaries present their vision of the future. This family event has something for everyone, including kids. Get more information and register today. http://sdm.link/attshape _______________________________________________ Libmesh-users mailing list Libmesh-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/libmesh-users