I think these also belong to the execution:

0 S becsekba  54421  52395  0  80   0 -  4438 wait   12:39 pts/92   00:00:00 
/bin/bash /apps/daint/UES/xalt/0.7.6/bin/srun -n 8 whale-dbg -i 
IMP/RunImpact2D.i
(gdb) bt
#0  0x00002b641c273cec in waitpid () from /lib64/libc.so.6
#1  0x00000000004297aa in run_sigchld_trap ()
#2  0x000000000042aabb in wait_for ()
#3  0x0000000000462223 in execute_command_internal ()
#4  0x0000000000460678 in execute_command_internal ()
#5  0x0000000000462961 in execute_command ()
#6  0x000000000046063c in execute_command_internal ()
#7  0x0000000000462961 in execute_command ()
#8  0x0000000000462152 in execute_command_internal ()
#9  0x0000000000460678 in execute_command_internal ()
#10 0x0000000000462961 in execute_command ()
#11 0x000000000046063c in execute_command_internal ()
#12 0x0000000000462961 in execute_command ()
#13 0x000000000046063c in execute_command_internal ()
#14 0x0000000000462961 in execute_command ()
#15 0x000000000046063c in execute_command_internal ()
#16 0x0000000000462961 in execute_command ()
#17 0x000000000046063c in execute_command_internal ()
#18 0x0000000000462394 in execute_command_internal ()
#19 0x000000000042313e in shell_execve ()
#20 0x000000000045f987 in coproc_reap ()
#21 0x0000000000460510 in execute_command_internal ()
#22 0x0000000000462961 in execute_command ()
#23 0x000000000041b7f1 in reader_loop ()
#24 0x000000000041b4db in main ()

and 

0 S becsekba  52395  49463  0  80   0 -  7458 wait   12:38 pts/92   00:00:00 
/usr/local/bin/bash
(gdb) bt
#0  0x00002b2e6b302cec in waitpid () from /lib64/libc.so.6
#1  0x0000000000429781 in run_sigchld_trap ()
#2  0x000000000042aa92 in wait_for ()
#3  0x000000000046222f in execute_command_internal ()
#4  0x0000000000462951 in execute_command ()
#5  0x000000000041b7f1 in reader_loop ()
#6  0x000000000041b4db in main ()


> On 13 Jan 2017, at 12:44, Barna Becsek <barnabec...@gmail.com> wrote:
> 
> Ok, this is the backtrace of the running processes. There are two processes 
> running:
> 
> 0 S becsekba  54451  54421  0  80   0 - 76108 futex_ 12:39 pts/92   00:00:00 
> /opt/slurm/16.05.8/bin/srun -n 8 whale-dbg -i IMP/RunImpact2D.i
> 1 S becsekba  54477  54451  0  80   0 - 24908 pipe_w 12:39 pts/92   00:00:00 
> /opt/slurm/16.05.8/bin/srun -n 8 whale-dbg -i IMP/RunImpact2D.i
> 
> attaching gdb to the first give me this stack frame:
> (gdb) bt
> #0  0x00002b0f815c003f in pthread_cond_wait@@GLIBC_2.3.2 () from 
> /lib64/libpthread.so.0
> #1  0x0000000000580ee4 in slurm_step_launch_wait_finish (ctx=0x99dd40) at 
> step_launch.c:622
> #2  0x00002b0f85db2490 in launch_p_step_wait (job=0x99e3e0, got_alloc=false) 
> at launch_slurm.c:692
> #3  0x0000000000587a82 in launch_g_step_wait (job=0x99e3e0, got_alloc=false) 
> at launch.c:523
> #4  0x000000000042d27a in srun (ac=6, av=0x7ffd0d0f2c58) at srun.c:288
> #5  0x000000000042dc21 in main (argc=6, argv=0x7ffd0d0f2c58) at 
> srun.wrapper.c:17
> 
> attaching gdb to the second gives me this stack frame:
> (gdb) bt
> #0  0x00002b0f815c2a60 in __read_nocancel () from /lib64/libpthread.so.0
> #1  0x00000000005918f7 in _shepard_spawn (job=0x99e3e0, got_alloc=false) at 
> srun_job.c:1383
> #2  0x000000000058fe15 in create_srun_job (p_job=0x7ecd00 <job>, 
> got_alloc=0x7ffd0d0f2a6f, slurm_started=false, handle_signals=true) at 
> srun_job.c:652
> #3  0x000000000042cd6c in srun (ac=6, av=0x7ffd0d0f2c58) at srun.c:194
> #4  0x000000000042dc21 in main (argc=6, argv=0x7ffd0d0f2c58) at 
> srun.wrapper.c:17
> 
> –Barna
> 
>> On 12 Jan 2017, at 17:51, Roy Stogner <royst...@ices.utexas.edu> wrote:
>> 
>> 
>> On Thu, 12 Jan 2017, Barna Becsek wrote:
>> 
>>> What I meant was the program will not exit gather_neighboring_elements. I 
>>> think the processes are still running.
>> 
>> Right.  But you can e.g. attach gdb to a running process to get a
>> stack trace.  If there's an infinite loop then we can at least find
>> out *where* it's looping.
>> ---
>> Roy
> 


------------------------------------------------------------------------------
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi
_______________________________________________
Libmesh-users mailing list
Libmesh-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/libmesh-users

Reply via email to