Hi, The runtime tries to abort all threads and waits for them to terminate, so if a thread refuses to die for some reason, the runtime will hang. Its possible that the serial port code doesn't check for thread aborts/interruptions.
Zoltan On Thu, Sep 17, 2009 at 2:33 PM, Leszek Ciesielski <skol...@gmail.com>wrote: > Oh! We found something with "mono --trace" that we missed before. It > seems that we are Thread.Abort() 'ing a thread thats inside > SerialPort.Read() (and through this and serial.c - in kernel mode) and > the abort gets ignored. However, on the managed side, everything > proceeds as though the thread was killed - until only unmanaged code > remains running - including the JITed rogue thread. I am checking now > this with a small test case and will send it along once I am able to > reproduce the problem. > > On Thu, Sep 17, 2009 at 1:49 PM, Leszek Ciesielski <skol...@gmail.com> > wrote: > > That's the > > > >> kill -3 PID prints: > > > >> "0" tid=0x0xb7d206f0 this=0x0x2fed8 thread handle 0x404 state: waiting > >> on 0x400 : Event owns () > > > > result, nothing more is printed... > > > > On Thu, Sep 17, 2009 at 1:25 PM, Zoltan Varga <var...@gmail.com> wrote: > >> Hi, > >> > >> My mistake. You should send a SIGQUIT signal. > >> > >> Zoltan > >> > >> On Thu, Sep 17, 2009 at 12:58 PM, Leszek Ciesielski <skol...@gmail.com> > >> wrote: > >>> > >>> Hi, > >>> > >>> kill -SIGUSR1 PID prints > >>> > >>> User definied signal 1 > >>> > >>> And Mono terminates. Does this suggest no managed threads were left > >>> (there are 10 or 11 while the application is running)? gdb native > >>> stack trace follows: > >>> > >>> 0xffffe430 in __kernel_vsyscall () > >>> (gdb) thread apply all bt > >>> > >>> Thread 4 (Thread 0xb7573b90 (LWP 25150)): > >>> #0 0xffffe430 in __kernel_vsyscall () > >>> #1 0xb7ee73f6 in nanosleep () from /lib/libpthread.so.0 > >>> #2 0x081a91f8 in collection_thread (unused=0x0) at collection.c:34 > >>> #3 0xb7ee01b5 in start_thread () from /lib/libpthread.so.0 > >>> #4 0xb7e263be in clone () from /lib/libc.so.6 > >>> > >>> Thread 3 (Thread 0xb754fb90 (LWP 25151)): > >>> #0 0xffffe430 in __kernel_vsyscall () > >>> #1 0xb7ee5ef5 in sem_wait@@GLIBC_2.1 () from /lib/libpthread.so.0 > >>> #2 0x0812eed9 in finalizer_thread (unused=0x0) at gc.c:1058 > >>> #3 0x08153188 in start_wrapper (data=0x8305078) at threads.c:623 > >>> #4 0x081c5d66 in thread_start_routine (args=0x82faaa4) at > threads.c:286 > >>> #5 0x081e5aa5 in GC_start_routine (arg=0x26f20) at > pthread_support.c:1382 > >>> #6 0xb7ee01b5 in start_thread () from /lib/libpthread.so.0 > >>> #7 0xb7e263be in clone () from /lib/libc.so.6 > >>> > >>> Thread 2 (Thread 0xb565ab90 (LWP 25339)): > >>> #0 0xb7efe3da in clock_gettime () from /lib/librt.so.1 > >>> #1 0x081d5705 in mono_100ns_ticks () at mono-time.c:107 > >>> #2 0xb568bf66 in ?? () > >>> #3 0xb568bf23 in ?? () > >>> #4 0xb568af80 in ?? () > >>> #5 0xb7916ba0 in ?? () > >>> #6 0x08110f14 in mono_runtime_delegate_invoke (delegate=0x1a6b712, > >>> params=0xb565a2e4, exc=0x0) > >>> at object.c:2943 > >>> #7 0x0815320f in start_wrapper (data=0x0) at threads.c:629 > >>> #8 0x081c5d66 in thread_start_routine (args=0x82faff4) at > threads.c:286 > >>> #9 0x081e5aa5 in GC_start_routine (arg=0x2dffe0) at > >>> pthread_support.c:1382 > >>> #10 0xb7ee01b5 in start_thread () from /lib/libpthread.so.0 > >>> #11 0xb7e263be in clone () from /lib/libc.so.6 > >>> > >>> Thread 1 (Thread 0xb7d206f0 (LWP 25117)): > >>> #0 0xffffe430 in __kernel_vsyscall () > >>> #1 0xb7ee3c35 in pthread_cond_wait@@GLIBC_2.3.2 () from > >>> /lib/libpthread.so.0 > >>> #2 0x081af0b1 in _wapi_handle_timedwait_signal_handle (handle=0x400, > >>> timeout=0x0, alertable=1, > >>> poll=0) at handles.c:1605 > >>> #3 0x081af1b7 in _wapi_handle_wait_signal (poll=0) at handles.c:1534 > >>> #4 0x081cac2b in WaitForMultipleObjectsEx (numobjects=2, > >>> handles=0x8c0a900, waitall=1, > >>> timeout=4294967295, alertable=0) at wait.c:723 > >>> #5 0x081510b1 in wait_for_tids (wait=0x8c0a900, timeout=365) at > >>> threads.c:2443 > >>> #6 0x0815488c in mono_thread_manage () at threads.c:2733 > >>> #7 0x080b25cd in mono_main (argc=2, argv=0xbfafbdb4) at driver.c:1648 > >>> #8 0x0805af21 in main (argc=Cannot access memory at address 0x80 > >>> ) at main.c:34 > >>> #0 0xffffe430 in __kernel_vsyscall () > >>> > >>> Regards, > >>> > >>> skolima > >>> > >>> On Thu, Sep 17, 2009 at 12:25 PM, Zoltan Varga <var...@gmail.com> > wrote: > >>> > Hi, > >>> > > >>> > You can attach to the hung process with gdb and type > >>> > 'thread apply all bt' to get a native backtrace, and/or > >>> > send a SIGUSR1 signal to the process to print a manager backtrace. > >>> > > >>> > Zoltan > >>> > > >>> > On Thu, Sep 17, 2009 at 12:15 PM, Leszek Ciesielski < > skol...@gmail.com> > >>> > wrote: > >>> >> > >>> >> Hi, > >>> >> > >>> >> we have tried to isolate the problem for almost a month, the best we > >>> >> managed to get is a hardware configuration for our application that > >>> >> hangs on every exit - but this is with about 8MB of binaries, > probably > >>> >> over 100k SLOC. What I am hoping for now are some gdb guidelines to > >>> >> pinpoint the problem. > >>> >> > >>> >> Regards > >>> >> > >>> >> On Thu, Sep 17, 2009 at 12:02 PM, Zoltan Varga <var...@gmail.com> > >>> >> wrote: > >>> >> > Hi, > >>> >> > > >>> >> > Could you create some kind of test case to help us debug this > issue > >>> >> > ? > >>> >> > > >>> >> > Zoltan > >>> >> > > >>> >> > On Thu, Sep 17, 2009 at 11:28 AM, Leszek Ciesielski > >>> >> > <skol...@gmail.com> > >>> >> > wrote: > >>> >> >> > >>> >> >> Hi, > >>> >> >> > >>> >> >> I am experiencing Mono hangup when my application should > terminate. > >>> >> >> The application opens multiple serial ports, but the bug has also > >>> >> >> manifested when network sockets were hanging on reads or writes - > it > >>> >> >> seems to be related to a pending I/O operation, asynchronous > >>> >> >> networking helps somewhat. Anyway, the managed code exits, Mono > CPU > >>> >> >> usage jumps to 100%, /proc/PID/status shows 4 threads and the > >>> >> >> application never exits. kill -3 PID prints: > >>> >> >> > >>> >> >> "0" tid=0x0xb7d0f6f0 this=0x0x2fed8 thread handle 0x404 state: > >>> >> >> waiting > >>> >> >> on 0x400 : Event owns () > >>> >> >> > >>> >> >> and that's all. What can I do to help debug this? > >>> >> >> > >>> >> >> BTW this happens on 1.9 (Debian and Gentoo) and 2.4.2.3 (Debian > and > >>> >> >> OpenSuse) [so I'm pretty sure it's not distribution-specific], > more > >>> >> >> often if the application uses System.Windows.Forms. > >>> >> >> > >>> >> >> Regards, > >>> >> >> > >>> >> >> Leszek 'skolima' Ciesielski > >>> >> >> _______________________________________________ > >>> >> >> Mono-devel-list mailing list > >>> >> >> Mono-devel-list@lists.ximian.com > >>> >> >> http://lists.ximian.com/mailman/listinfo/mono-devel-list > >>> >> > > >>> >> > > >>> > > >>> > > >> > >> > > >
_______________________________________________ Mono-devel-list mailing list Mono-devel-list@lists.ximian.com http://lists.ximian.com/mailman/listinfo/mono-devel-list