Ok, this is the backtrace of the running processes. There are two processes running:
0 S becsekba 54451 54421 0 80 0 - 76108 futex_ 12:39 pts/92 00:00:00 /opt/slurm/16.05.8/bin/srun -n 8 whale-dbg -i IMP/RunImpact2D.i 1 S becsekba 54477 54451 0 80 0 - 24908 pipe_w 12:39 pts/92 00:00:00 /opt/slurm/16.05.8/bin/srun -n 8 whale-dbg -i IMP/RunImpact2D.i attaching gdb to the first give me this stack frame: (gdb) bt #0 0x00002b0f815c003f in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x0000000000580ee4 in slurm_step_launch_wait_finish (ctx=0x99dd40) at step_launch.c:622 #2 0x00002b0f85db2490 in launch_p_step_wait (job=0x99e3e0, got_alloc=false) at launch_slurm.c:692 #3 0x0000000000587a82 in launch_g_step_wait (job=0x99e3e0, got_alloc=false) at launch.c:523 #4 0x000000000042d27a in srun (ac=6, av=0x7ffd0d0f2c58) at srun.c:288 #5 0x000000000042dc21 in main (argc=6, argv=0x7ffd0d0f2c58) at srun.wrapper.c:17 attaching gdb to the second gives me this stack frame: (gdb) bt #0 0x00002b0f815c2a60 in __read_nocancel () from /lib64/libpthread.so.0 #1 0x00000000005918f7 in _shepard_spawn (job=0x99e3e0, got_alloc=false) at srun_job.c:1383 #2 0x000000000058fe15 in create_srun_job (p_job=0x7ecd00 <job>, got_alloc=0x7ffd0d0f2a6f, slurm_started=false, handle_signals=true) at srun_job.c:652 #3 0x000000000042cd6c in srun (ac=6, av=0x7ffd0d0f2c58) at srun.c:194 #4 0x000000000042dc21 in main (argc=6, argv=0x7ffd0d0f2c58) at srun.wrapper.c:17 –Barna > On 12 Jan 2017, at 17:51, Roy Stogner <royst...@ices.utexas.edu> wrote: > > > On Thu, 12 Jan 2017, Barna Becsek wrote: > >> What I meant was the program will not exit gather_neighboring_elements. I >> think the processes are still running. > > Right. But you can e.g. attach gdb to a running process to get a > stack trace. If there's an infinite loop then we can at least find > out *where* it's looping. > --- > Roy ------------------------------------------------------------------------------ Developer Access Program for Intel Xeon Phi Processors Access to Intel Xeon Phi processor-based developer platforms. With one year of Intel Parallel Studio XE. Training and support from Colfax. Order your platform today. http://sdm.link/xeonphi _______________________________________________ Libmesh-users mailing list Libmesh-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/libmesh-users