Hi Kosta,
>From the backtrace of the main thread, it looks like the main thread didn't
receive the checkpoint signal (SIGUSR2) from the ckpt-thread. I am not sure
what caused it. Is this something I can reproduce on one of my local
machines?
Kapil
On Fri, Aug 16, 2013 at 7:55 PM, Kosta Malolin
<kosta.malo...@ericsson.com>wrote:
> I am seeing this issue when trying to dump a state of an application.****
>
> Here is the state of the application when examined with gdb :****
>
> ** **
>
> (gdb) info thread****
>
> Id Target Id Frame ****
>
> 3 Thread 0x422e1940 (LWP 12723) 0x00002b69d547b23f in mtcp_futex
> (uaddr=0x2b69d5687fd8, op=0, val=2, timeout=0x2b69d547d280) at
> mtcp_futex.h:24****
>
> 2 Thread 0x42ce2940 (LWP 12724) 0x00002b69d547b23f in mtcp_futex
> (uaddr=0x22371848, op=0, val=5, timeout=0x0) at mtcp_futex.h:24****
>
> * 1 Thread 0x2b69d8b19f20 (LWP 12722) 0x00002b69d6d86541 in nanosleep
> () from /lib64/libc.so.6****
>
> (gdb) bt****
>
> #0 0x00002b69d6d86541 in nanosleep () from /lib64/libc.so.6****
>
> #1 0x00002b69d6db9ed4 in usleep () from /lib64/libc.so.6****
>
> #2 0x000000000040cdbc in AmberSimLoop::simLoop() ()****
>
> #3 0x000000000040799b in main ()****
>
> (gdb) thread 2****
>
> [Switching to thread 2 (Thread 0x42ce2940 (LWP 12724))]****
>
> #0 0x00002b69d547b23f in mtcp_futex (uaddr=0x22371848, op=0, val=5,
> timeout=0x0) at mtcp_futex.h:24****
>
> 24 asm volatile ("syscall"****
>
> (gdb) bt****
>
> #0 0x00002b69d547b23f in mtcp_futex (uaddr=0x22371848, op=0, val=5,
> timeout=0x0) at mtcp_futex.h:24****
>
> #1 0x00002b69d547b1e4 in mtcp_state_futex (state=0x22371848, func=0,
> val=5, timeout=0x0) at mtcp_state.c:47****
>
> #2 0x00002b69d54739a7 in stopthisthread (signum=12) at mtcp.c:3474****
>
> #3 <signal handler called>****
>
> #4 0x00002b69d6dc08a8 in epoll_wait () from /lib64/libc.so.6****
>
> #5 0x00000000007d7bcd in AmberPciePortHandler::handleSlaveRequests() ()**
> **
>
> #6 0x000000000040ef59 in spawnPcieServer(void*) ()****
>
> #7 0x00002b69d5cd373d in start_thread () from /lib64/libpthread.so.0****
>
> #8 0x00002b69d546e957 in threadcloned (threadv=0x22371830) at mtcp.c:1231
> ****
>
> #9 0x00002b69d6dc04bd in clone () from /lib64/libc.so.6****
>
> #10 0x0000000000000000 in ?? ()****
>
> (gdb) thread 3 ****
>
> [Switching to thread 3 (Thread 0x422e1940 (LWP 12723))]****
>
> #0 0x00002b69d547b23f in mtcp_futex (uaddr=0x2b69d5687fd8, op=0, val=2,
> timeout=0x2b69d547d280) at mtcp_futex.h:24****
>
> 24 asm volatile ("syscall"****
>
> (gdb) bt****
>
> #0 0x00002b69d547b23f in mtcp_futex (uaddr=0x2b69d5687fd8, op=0, val=2,
> timeout=0x2b69d547d280) at mtcp_futex.h:24****
>
> #1 0x00002b69d547b1e4 in mtcp_state_futex (state=0x2b69d5687fd8, func=0,
> val=2, timeout=0x2b69d547d280) at mtcp_state.c:47****
>
> #2 0x00002b69d546fc90 in checkpointhread (dummy=0x0) at mtcp.c:1998****
>
> #3 0x00002b69d5cd373d in start_thread () from /lib64/libpthread.so.0****
>
> #4 0x00002b69d546e957 in threadcloned (threadv=0x1b98fb70) at mtcp.c:1231
> ****
>
> #5 0x00002b69d6dc04bd in clone () from /lib64/libc.so.6****
>
> #6 0x0000000000000000 in ?? ()****
>
> (gdb)****
>
> ** **
>
> Apparently, and attempt to dump checkpoint was taken when the thread 1 was
> in nanosleep() and the thread 2 in epoll_wait()****
>
> This resulted in a deadlock. Any ideas on what is going on ?****
>
> ** **
>
> -Kosta****
>
>
> ------------------------------------------------------------------------------
> Get 100% visibility into Java/.NET code with AppDynamics Lite!
> It's a free troubleshooting tool designed for production.
> Get down to code-level detail for bottlenecks, with <2% overhead.
> Download for free and get started troubleshooting in minutes.
> http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
> _______________________________________________
> Dmtcp-forum mailing list
> Dmtcp-forum@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dmtcp-forum
>
>
------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead.
Download for free and get started troubleshooting in minutes.
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
_______________________________________________
Dmtcp-forum mailing list
Dmtcp-forum@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dmtcp-forum