I think these also belong to the execution: 0 S becsekba 54421 52395 0 80 0 - 4438 wait 12:39 pts/92 00:00:00 /bin/bash /apps/daint/UES/xalt/0.7.6/bin/srun -n 8 whale-dbg -i IMP/RunImpact2D.i (gdb) bt #0 0x00002b641c273cec in waitpid () from /lib64/libc.so.6 #1 0x00000000004297aa in run_sigchld_trap () #2 0x000000000042aabb in wait_for () #3 0x0000000000462223 in execute_command_internal () #4 0x0000000000460678 in execute_command_internal () #5 0x0000000000462961 in execute_command () #6 0x000000000046063c in execute_command_internal () #7 0x0000000000462961 in execute_command () #8 0x0000000000462152 in execute_command_internal () #9 0x0000000000460678 in execute_command_internal () #10 0x0000000000462961 in execute_command () #11 0x000000000046063c in execute_command_internal () #12 0x0000000000462961 in execute_command () #13 0x000000000046063c in execute_command_internal () #14 0x0000000000462961 in execute_command () #15 0x000000000046063c in execute_command_internal () #16 0x0000000000462961 in execute_command () #17 0x000000000046063c in execute_command_internal () #18 0x0000000000462394 in execute_command_internal () #19 0x000000000042313e in shell_execve () #20 0x000000000045f987 in coproc_reap () #21 0x0000000000460510 in execute_command_internal () #22 0x0000000000462961 in execute_command () #23 0x000000000041b7f1 in reader_loop () #24 0x000000000041b4db in main ()
and 0 S becsekba 52395 49463 0 80 0 - 7458 wait 12:38 pts/92 00:00:00 /usr/local/bin/bash (gdb) bt #0 0x00002b2e6b302cec in waitpid () from /lib64/libc.so.6 #1 0x0000000000429781 in run_sigchld_trap () #2 0x000000000042aa92 in wait_for () #3 0x000000000046222f in execute_command_internal () #4 0x0000000000462951 in execute_command () #5 0x000000000041b7f1 in reader_loop () #6 0x000000000041b4db in main () > On 13 Jan 2017, at 12:44, Barna Becsek <barnabec...@gmail.com> wrote: > > Ok, this is the backtrace of the running processes. There are two processes > running: > > 0 S becsekba 54451 54421 0 80 0 - 76108 futex_ 12:39 pts/92 00:00:00 > /opt/slurm/16.05.8/bin/srun -n 8 whale-dbg -i IMP/RunImpact2D.i > 1 S becsekba 54477 54451 0 80 0 - 24908 pipe_w 12:39 pts/92 00:00:00 > /opt/slurm/16.05.8/bin/srun -n 8 whale-dbg -i IMP/RunImpact2D.i > > attaching gdb to the first give me this stack frame: > (gdb) bt > #0 0x00002b0f815c003f in pthread_cond_wait@@GLIBC_2.3.2 () from > /lib64/libpthread.so.0 > #1 0x0000000000580ee4 in slurm_step_launch_wait_finish (ctx=0x99dd40) at > step_launch.c:622 > #2 0x00002b0f85db2490 in launch_p_step_wait (job=0x99e3e0, got_alloc=false) > at launch_slurm.c:692 > #3 0x0000000000587a82 in launch_g_step_wait (job=0x99e3e0, got_alloc=false) > at launch.c:523 > #4 0x000000000042d27a in srun (ac=6, av=0x7ffd0d0f2c58) at srun.c:288 > #5 0x000000000042dc21 in main (argc=6, argv=0x7ffd0d0f2c58) at > srun.wrapper.c:17 > > attaching gdb to the second gives me this stack frame: > (gdb) bt > #0 0x00002b0f815c2a60 in __read_nocancel () from /lib64/libpthread.so.0 > #1 0x00000000005918f7 in _shepard_spawn (job=0x99e3e0, got_alloc=false) at > srun_job.c:1383 > #2 0x000000000058fe15 in create_srun_job (p_job=0x7ecd00 <job>, > got_alloc=0x7ffd0d0f2a6f, slurm_started=false, handle_signals=true) at > srun_job.c:652 > #3 0x000000000042cd6c in srun (ac=6, av=0x7ffd0d0f2c58) at srun.c:194 > #4 0x000000000042dc21 in main (argc=6, argv=0x7ffd0d0f2c58) at > srun.wrapper.c:17 > > –Barna > >> On 12 Jan 2017, at 17:51, Roy Stogner <royst...@ices.utexas.edu> wrote: >> >> >> On Thu, 12 Jan 2017, Barna Becsek wrote: >> >>> What I meant was the program will not exit gather_neighboring_elements. I >>> think the processes are still running. >> >> Right. But you can e.g. attach gdb to a running process to get a >> stack trace. If there's an infinite loop then we can at least find >> out *where* it's looping. >> --- >> Roy > ------------------------------------------------------------------------------ Developer Access Program for Intel Xeon Phi Processors Access to Intel Xeon Phi processor-based developer platforms. With one year of Intel Parallel Studio XE. Training and support from Colfax. Order your platform today. http://sdm.link/xeonphi _______________________________________________ Libmesh-users mailing list Libmesh-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/libmesh-users