Dear All,

I don't know for some reason my email is rejected again.
However, I've already registered the email.

Could you please let me know if you receive my email?

best regards,
Fatemeh



On Thu, Nov 22, 2018 at 6:28 PM Fatemeh Chegini Salzmann <
[email protected]> wrote:

> Dear All,
>
> Did someone perhaps get a chance to look into the issue I submitted?
> We would really appreciate any support.
>
> Best regards,
> Fatemeh
>
> On Wed, Nov 14, 2018 at 3:31 PM <[email protected]>
> wrote:
>
>> Your message has been rejected, probably because you are not
>> subscribed to the mailing list and the list's policy is to prohibit
>> non-members from posting to it.  If you think that your messages are
>> being rejected in error, contact the mailing list owner at
>> [email protected].
>>
>>
>>
>>
>> ---------- Forwarded message ----------
>> From: Fatemeh Chegini Salzmann <[email protected]>
>> To: [email protected]
>> Cc:
>> Bcc:
>> Date: Wed, 14 Nov 2018 15:31:15 +0100
>> Subject: Fwd: [Libmesh]: In parallel error :Segmentation Violation,
>> probably memory access out of range
>>
>> Dear All,
>>
>> I just created a public git repository for some section of the code which
>> causes the problem,
>>
>> https://bitbucket.org/FatemehChe/reaction_diffusion/src/master/
>>
>> Shortly speaking, this code is part of a PDE-Constrained optimization
>> problem, and this function is called at each iteration of an iterative
>> solver.
>> As I mentioned before it randomly gives us this error  "*Segmentation
>> Violation, probably memory access out of range*".
>> That's why I run it this function 10k times to show you the error, you
>> can see the error in the file called *result/mono *
>> Although, I am able to run it on my local machine (Mac/OS, CPU 2.5 GHz
>> Intel Core i7) with 4 cores, without any problem,
>> whenever I run the project on the cluster with more cores it gives me
>> that error after some iterations.
>>
>> To run the project on the cluster,
>> - *make*
>> - *sh benchmark*
>>
>> Thanks in advance.
>>
>> Best regards,
>> Fatemeh
>>
>> ---------- Forwarded message ---------
>> From: John Peterson <[email protected]>
>> Date: Tue, Nov 13, 2018 at 4:07 PM
>> Subject: Re: [Libmesh]: In parallel error :Segmentation Violation,
>> probably memory access out of range
>> To: Fatemeh Chegini Salzmann <[email protected]>
>>
>>
>> Hi Fatemeh,
>>
>> Two requests:
>>
>> 1.) please provide your entire application code if possible, we can't
>> debug code snippets
>> 2.) please send help requests like this to [email protected] or
>> open an issue on GitHub. That way you will be more likely to get help
>> people with more time and/or insight than me.
>>
>> --
>> John
>>
>>
>> On Tue, Nov 13, 2018 at 3:38 AM Fatemeh Chegini Salzmann <
>> [email protected]> wrote:
>>
>>> Dear John,
>>>
>>> I have couple of transient PDEs  that need to be solved at each time
>>> iteration, where the solution of one PDE is used to solve the next PDE.
>>> As you can see shortly here:
>>> //
>>> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>     system_gateVariable.time  = 0;
>>>     system_diffusion.time        = 0;
>>>     system_reaction.time        = 0
>>>   unsigned int t_step;
>>>   for (t_step=1; t_step<(T); t_step++)
>>>   {
>>>
>>>     system_gateVariable.time += dt_T;
>>>     system_diffusion.time    += dt_T;
>>>     system_reaction.time     += dt_T;
>>>
>>>     *system_gateVariable.old_local_solution =
>>> *system_gateVariable.current_local_solution;
>>>     *system_reaction.old_local_solution =
>>> *system_reaction.current_local_solution;
>>>     *system_diffusion.old_local_solution =
>>> *system_diffusion.current_local_solution;
>>>
>>>     equation_systems.get_system("gateVariable").solve();
>>>     equation_systems.get_system("reaction").solve();
>>>
>>>   *system_diffusion.old_local_solution =
>>> *system_reaction.current_local_solution;
>>>    equation_systems.get_system("diffusion").solve();
>>> }
>>> //
>>> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>
>>>
>>>  and in the assembly function, I called the *old_solution* whenever i
>>> need to read the solution of each PDE. as follows:
>>> system.*old_solution*(dof_indices[l]);,
>>>
>>> I have this error when I run the project in parallel mode "*Caught
>>> signal number 11 SEGV: Segmentation Violation, probably memory access out
>>> of range*"
>>> I even tried to use *close()*  after solving each equation, e.g.*
>>> equation_systems.get_system("reaction").solution.close();* which didn't
>>> help.
>>> I don't know how to deal with this problem.
>>>
>>> I would appreciate any help/suggetion.
>>>
>>> Thank in advance,
>>> Fatemeh
>>>
>>> *FYI:*
>>>
>>> [1]PETSC ERROR: ------------------------------------------------
>>> ------------------------
>>>
>>> [1]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation,
>>> probably memory access out of range
>>>
>>> [1]PETSC ERROR: Try option -start_in_debugger or
>>> -on_error_attach_debugger
>>>
>>> [1]PETSC ERROR: or see
>>> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
>>>
>>> [1]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac
>>> OS X to find memory corruption errors
>>>
>>> [1]PETSC ERROR: configure using --with-debugging=yes, recompile, link,
>>> and run
>>>
>>> [1]PETSC ERROR: to get more information on the crash.
>>>
>>> [1]PETSC ERROR: --------------------- Error Message
>>> --------------------------------------------------------------
>>>
>>> [1]PETSC ERROR: Signal received
>>>
>>> [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html
>>> for trouble shooting.
>>>
>>> [1]PETSC ERROR: Petsc Release Version 3.8.3, Dec, 09, 2017
>>>
>>> [1]PETSC ERROR:
>>> /home/chegini/inverseProblem/LibmeshCode_InParallel_Cluster/./assemble on a
>>> arch-linux2-c-opt named icsnode17 by chegini Tue Nov 13 10:56:57 2018
>>>
>>> [1]PETSC ERROR: Configure options --prefix=/apps/petsc/3.8.3
>>> --download-hypre=1 --with-ssl=0 --with-debugging=no --with-pic=1
>>> --with-shared-libraries=1 --with-cc=mpicc --with-cxx=mpicxx
>>> --with-fc=mpif90 --download-fblaslapack=1 --download-metis=1
>>> --download-parmetis=1 --download-superlu_dist=1 --download-mumps=1
>>> --download-scalapack=1 --CC=mpicc --CXX=mpicxx --FC=mpif90 --F77=mpif77
>>> --F90=mpif90 --CFLAGS="-fPIC -fopenmp" --CXXFLAGS="-fPIC -fopenmp"
>>> --FFLAGS="-fPIC -fopenmp" --FCFLAGS="-fPIC -fopenmp" --F90FLAGS="-fPIC
>>> -fopenmp" --F77FLAGS="-fPIC -fopenmp"
>>> PETSC_DIR=/apps/petsc/3.8.3/src/petsc-3.8.3
>>>
>>> [1]PETSC ERROR: #1 User provided function() line 0 in  unknown file
>>>
>>> ------------------------------------------------
>>> --------------------------
>>>
>>> MPI_ABORT was invoked on rank 1 in communicator MPI_COMM_WORLD
>>>
>>> with errorcode 59.
>>>
>>>
>>> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
>>>
>>> You may or may not see output from other processes, depending on
>>>
>>> exactly when Open MPI kills them.
>>>
>>> ------------------------------------------------
>>> --------------------------
>>>
>>> In: PMI_Abort(59, N/A)
>>>
>>> srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
>>>
>>> slurmstepd: *** STEP 924595.0 ON icsnode17 CANCELLED AT
>>> 2018-11-13T11:00:55 ***
>>>
>>> srun: error: icsnode17: tasks 0,3,5-16,18-19: Killed
>>>
>>> srun: error: icsnode17: task 1: Exited with exit code 59
>>>
>>> srun: error: icsnode17: tasks 2,4,17: Killed
>>>
>>
>>
>>

_______________________________________________
Libmesh-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/libmesh-users

Reply via email to