Dear All,

I just created a public git repository for some section of the code which 
causes the problem,  

https://bitbucket.org/FatemehChe/reaction_diffusion/src/master/

Shortly speaking, this code is part of a PDE-Constrained optimization problem, 
and this function is called at each iteration of an iterative solver. 
As I mentioned before it randomly gives us this error  "Segmentation Violation, 
probably memory access out of range". 
That's why I run it this function 10k times to show you the error, you can see 
the error in the file called result/mono  
Although, I am able to run it on my local machine (Mac/OS, CPU 2.5 GHz Intel 
Core i7) with 4 cores, without any problem,  
whenever I run the project on the cluster with more cores it gives me that 
error after some iterations.

To run the project on the cluster,
- make
- sh benchmark

Thanks in advance.

Best regards,
Fatemeh



> 
> 
> ---------- Forwarded message ---------
> From: John Peterson <jwpeter...@gmail.com <mailto:jwpeter...@gmail.com>>
> Date: Tue, Nov 13, 2018 at 4:07 PM
> Subject: Re: [Libmesh]: In parallel error :Segmentation Violation, probably 
> memory access out of range
> To: Fatemeh Chegini Salzmann <chegini.fat...@gmail.com 
> <mailto:chegini.fat...@gmail.com>>
> 
> 
> Hi Fatemeh,
> 
> Two requests: 
> 
> 1.) please provide your entire application code if possible, we can't debug 
> code snippets
> 2.) please send help requests like this to libmesh-us...@lists.sf.net 
> <mailto:libmesh-us...@lists.sf.net> or open an issue on GitHub. That way you 
> will be more likely to get help people with more time and/or insight than me.
> 
> -- 
> John
> 
> 
> On Tue, Nov 13, 2018 at 3:38 AM Fatemeh Chegini Salzmann 
> <chegini.fat...@gmail.com <mailto:chegini.fat...@gmail.com>> wrote:
> Dear John,
> 
> I have couple of transient PDEs  that need to be solved at each time 
> iteration, where the solution of one PDE is used to solve the next PDE.
> As you can see shortly here:
> // 
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>     system_gateVariable.time  = 0;
>     system_diffusion.time        = 0;  
>     system_reaction.time        = 0  
>   unsigned int t_step;
>   for (t_step=1; t_step<(T); t_step++)
>   {
> 
>     system_gateVariable.time += dt_T;
>     system_diffusion.time    += dt_T;  
>     system_reaction.time     += dt_T;  
>     
>     *system_gateVariable.old_local_solution = 
> *system_gateVariable.current_local_solution; 
>     *system_reaction.old_local_solution = 
> *system_reaction.current_local_solution;         
>     *system_diffusion.old_local_solution = 
> *system_diffusion.current_local_solution;       
> 
>     equation_systems.get_system("gateVariable").solve(); 
>     equation_systems.get_system("reaction").solve(); 
>   
>   *system_diffusion.old_local_solution = 
> *system_reaction.current_local_solution;       
>    equation_systems.get_system("diffusion").solve();
> }
> // 
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>     
> 
>  and in the assembly function, I called the old_solution whenever i need to 
> read the solution of each PDE. as follows:
> system.old_solution(dof_indices[l]);,   
> 
> I have this error when I run the project in parallel mode "Caught signal 
> number 11 SEGV: Segmentation Violation, probably memory access out of range"
> I even tried to use close()  after solving each equation, e.g. 
> equation_systems.get_system("reaction").solution.close(); which didn't help.
> I don't know how to deal with this problem.
> 
> I would appreciate any help/suggetion.
> 
> Thank in advance,
> Fatemeh
> 
> FYI:
> [1]PETSC ERROR: 
> ------------------------------------------------------------------------
> [1]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, 
> probably memory access out of range
> [1]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
> [1]PETSC ERROR: or see 
> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind 
> <http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind>
> [1]PETSC ERROR: or try http://valgrind.org <http://valgrind.org/> on 
> GNU/linux and Apple Mac OS X to find memory corruption errors
> [1]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run
> [1]PETSC ERROR: to get more information on the crash.
> [1]PETSC ERROR: --------------------- Error Message 
> --------------------------------------------------------------
> [1]PETSC ERROR: Signal received
> [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html 
> <http://www.mcs.anl.gov/petsc/documentation/faq.html> for trouble shooting.
> [1]PETSC ERROR: Petsc Release Version 3.8.3, Dec, 09, 2017
> [1]PETSC ERROR: 
> /home/chegini/inverseProblem/LibmeshCode_InParallel_Cluster/./assemble on a 
> arch-linux2-c-opt named icsnode17 by chegini Tue Nov 13 10:56:57 2018
> [1]PETSC ERROR: Configure options --prefix=/apps/petsc/3.8.3 
> --download-hypre=1 --with-ssl=0 --with-debugging=no --with-pic=1 
> --with-shared-libraries=1 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 
> --download-fblaslapack=1 --download-metis=1 --download-parmetis=1 
> --download-superlu_dist=1 --download-mumps=1 --download-scalapack=1 
> --CC=mpicc --CXX=mpicxx --FC=mpif90 --F77=mpif77 --F90=mpif90 --CFLAGS="-fPIC 
> -fopenmp" --CXXFLAGS="-fPIC -fopenmp" --FFLAGS="-fPIC -fopenmp" 
> --FCFLAGS="-fPIC -fopenmp" --F90FLAGS="-fPIC -fopenmp" --F77FLAGS="-fPIC 
> -fopenmp" PETSC_DIR=/apps/petsc/3.8.3/src/petsc-3.8.3
> [1]PETSC ERROR: #1 User provided function() line 0 in  unknown file
> --------------------------------------------------------------------------
> MPI_ABORT was invoked on rank 1 in communicator MPI_COMM_WORLD
> with errorcode 59.
> 
> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
> You may or may not see output from other processes, depending on
> exactly when Open MPI kills them.
> --------------------------------------------------------------------------
> In: PMI_Abort(59, N/A)
> srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
> slurmstepd: *** STEP 924595.0 ON icsnode17 CANCELLED AT 2018-11-13T11:00:55 
> ***
> srun: error: icsnode17: tasks 0,3,5-16,18-19: Killed
> srun: error: icsnode17: task 1: Exited with exit code 59
> srun: error: icsnode17: tasks 2,4,17: Killed
> 
> 
> 
> 


_______________________________________________
Libmesh-users mailing list
Libmesh-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/libmesh-users

Reply via email to