http://qa.mandrakesoft.com/show_bug.cgi?id=4709
------- Additional Comments From [EMAIL PROTECTED] 2003-12-08 18:11 -------
I have seen exactly the same thing with OpenSceneGraph :-( It seems that there
is a kernel bug in thread signal handling - one thread dies and the rest is
waiting for it forever.
Some more info about this problem.
It seems that also stock kernel 2.4.22pre10 exhibits this behavior.
Moreover, it looks like one thread dies and then the rest waits forever
for a signal which never comes :
-----------------------
$ strace -e trace=signal -f osganimate
rt_sigaction(SIGRTMIN, {0x40544000, [], SA_RESTORER, 0x405e1488}, NULL,
8) = 0
rt_sigaction(SIGRT_1, {0x405438c0, [], SA_RESTORER, 0x405e1488}, NULL,
8) = 0
rt_sigaction(SIGRT_2, {0x40544040, [], SA_RESTORER, 0x405e1488}, NULL,
8) = 0
rt_sigprocmask(SIG_BLOCK, [RTMIN], NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [33], NULL, 8) = 0
Process 5191 attached (waiting for parent)
Process 5191 resumed (parent 5187 ready)
[pid 5191] --- SIGSTOP (Stopped (signal)) @ 0 (0) ---
[pid 5191] rt_sigprocmask(SIG_SETMASK, ~[TRAP 33], NULL, 8) = 0
[pid 5187] rt_sigprocmask(SIG_SETMASK, NULL, [RTMIN], 8) = 0
[pid 5187] rt_sigprocmask(SIG_SETMASK, NULL, [RTMIN], 8) = 0
[pid 5187] rt_sigsuspend([]Process 5192 attached
~ <unfinished ...>
[pid 5192] --- SIGSTOP (Stopped (signal)) @ 0 (0) ---
[pid 5191] kill(5187, SIGRTMIN) = 0
[pid 5192] rt_sigprocmask(SIG_SETMASK, [RTMIN], NULL, 8) = 0
[pid 5187] --- SIGRTMIN (Unknown signal 32) @ 0 (0) ---
[pid 5192] rt_sigprocmask(SIG_SETMASK, NULL, <unfinished ...>
[pid 5187] <... rt_sigsuspend resumed> ) = -1 EINTR (Interrupted system
call)
[pid 5192] <... rt_sigprocmask resumed> [RTMIN], 8) = 0
[pid 5187] sigreturn( <unfinished ...>
[pid 5192] rt_sigsuspend([] <unfinished ...>
[pid 5187] <... sigreturn resumed> ) = ? (mask now [RTMIN])
[pid 5187] rt_sigprocmask(SIG_SETMASK, NULL, [RTMIN], 8) = 0
[pid 5187] rt_sigprocmask(SIG_SETMASK, NULL, [RTMIN], 8) = 0
[pid 5187] rt_sigsuspend([]Process 5193 attached (waiting for parent)
Process 5193 resumed (parent 5191 ready)
~ <unfinished ...>
[pid 5193] --- SIGSTOP (Stopped (signal)) @ 0 (0) ---
[pid 5191] kill(5187, SIGRTMIN) = 0
[pid 5193] rt_sigprocmask(SIG_SETMASK, [RTMIN], NULL, 8) = 0
[pid 5187] --- SIGRTMIN (Unknown signal 32) @ 0 (0) ---
[pid 5187] <... rt_sigsuspend resumed> ) = -1 EINTR (Interrupted system
call)
[pid 5187] sigreturn() = ? (mask now [RTMIN])
[pid 5187] rt_sigprocmask(SIG_SETMASK, NULL, [RTMIN], 8) = 0
[pid 5187] rt_sigsuspend([] <unfinished ...>
[pid 5193] rt_sigprocmask(SIG_SETMASK, NULL, [RTMIN], 8) = 0
[pid 5193] rt_sigprocmask(SIG_SETMASK, NULL, [RTMIN], 8) = 0
[pid 5193] rt_sigsuspend([]Process 5194 attached (waiting for parent)
Process 5194 resumed (parent 5191 ready)
~ <unfinished ...>
[pid 5194] --- SIGSTOP (Stopped (signal)) @ 0 (0) ---
[pid 5191] kill(5193, SIGRTMIN <unfinished ...>
[pid 5193] --- SIGRTMIN (Unknown signal 32) @ 0 (0) ---
Process 5193 detached
^^^^^^^^^^^^^^^^^^^^^^^^^ thread died ?
[pid 5191] <... kill resumed> ) = 0
[pid 5194] rt_sigprocmask(SIG_SETMASK, [RTMIN], NULL, 8) = 0
[pid 5194] kill(5193, SIGRTMIN) = 0
[pid 5194] kill(5193, SIGRTMIN) = 0
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Trying to send a signal to a dead thread, it seems and then it waits
forever for response.
I am not PosixThreads expert, but this looks abnormal to me. Either some
signal got missing (SIGSEGV, for example, if something crashed and the
thread manager didn't "notice" the disappearance of one of the threads)
or there is something funny going on.
--
Configure bugmail: http://qa.mandrakesoft.com/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
------- Reminder: -------
assigned_to: [EMAIL PROTECTED]
status: UNCONFIRMED
creation_date:
description:
I originally had this problem on Mandrake 9.1, but thought it could be caused by
specific configuration of that system - I've installed lots of software on it,
and some from Cooker.
Recently I've clean-installed Mandrake 9.2 beta2 (formatting the '/' partition,
and leaving only '/home' intact), and the problem still exists.
The symptoms are: after some time the system is running, it gets impossible t
start new shell processes - bash,csh, any.
All applications that try to launch a shell for some task hang at that stage too
- so non-interactive shells cannot start too.
The shells that are already started are working fine - it's only impossible to
start new ones.
I've run a screen session with multiple windows and waited for the problem to
occur, and then did strace on various programs that launch shells.
They look similarly - they susually hang in waitpit() or read() function.
For example this strace end comes from bash:
<snip>
stat64("/bin/bash", {st_mode=S_IFREG|0755, st_size=641868, ...}) = 0
getpgrp() = 5960
rt_sigaction(SIGCHLD, {0x80776a0, [], SA_RESTORER, 0x40051c68}, {SIG_DFL}, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [RTMIN], 8) = 0
fcntl64(0, F_GETFL) = 0x2 (flags O_RDWR)
fstat64(0, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 5), ...}) = 0
_llseek(0, 0, 0xbfffefe8, SEEK_CUR) = -1 ESPIPE (Illegal seek)
rt_sigprocmask(SIG_BLOCK, NULL, [RTMIN], 8) = 0
read(0, <unfinished ...>
<snip>
and this strace end comes from csh:
<snip>
rt_sigprocmask(SIG_SETMASK, [INT RTMIN], [INT RTMIN], 8) = 0
close(0) = -1 EBADF (Bad file descriptor)
dup(19) = 0
fcntl64(0, F_SETFD, 0) = 0
close(1) = -1 EBADF (Bad file descriptor)
dup(17) = 1
fcntl64(1, F_SETFD, 0) = 0
close(2) = -1 EBADF (Bad file descriptor)
dup(18) = 2
fcntl64(2, F_SETFD, 0) = 0
pipe([5, 6]) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [INT RTMIN], 8) = 0
fork() = 5991
gettimeofday({1060688761, 435762}, NULL) = 0
rt_sigprocmask(SIG_SETMASK, [INT RTMIN], [INT CHLD RTMIN], 8) = 0
close(6) = 0
read(5, <unfinished ...>
<snip>
and this strace end comes from xterm:
<snip>
close(4) = 0
close(4) = -1 EBADF (Bad file descriptor)
rt_sigprocmask(SIG_BLOCK, [INT CHLD], [CHLD RTMIN], 8) = 0
fork() = 5967
rt_sigprocmask(SIG_SETMASK, [CHLD RTMIN], NULL, 8) = 0
close(3) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [CHLD RTMIN], 8) = 0
rt_sigprocmask(SIG_SETMASK, [CHLD RTMIN], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [CHLD RTMIN], 8) = 0
rt_sigaction(SIGINT, {0x8076710, [], SA_RESTORER, 0x40051c68}, {SIG_DFL}, 8) = 0
waitpid(-1, [WIFEXITED(s) && WEXITSTATUS(s) == 0], 0) = 5966
waitpid(-1, <unfinished ...>
<snip>
The launch operation for programs like xterm, csh can be interrupted with
CTRL-C, but when I run bash - I get the following errors in strace and the
process won't terminate when I press CTRL-C:
<snip>
close(3) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [CHLD RTMIN], 8) = 0
rt_sigprocmask(SIG_SETMASK, [CHLD RTMIN], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [CHLD RTMIN], 8) = 0
rt_sigaction(SIGINT, {0x8076710, [], SA_RESTORER, 0x40051c68}, {0x8087030, [],
SA_RESTORER, 0x40051c68}, 8) = 0
waitpid(-1, [WIFEXITED(s) && WEXITSTATUS(s) == 0], 0) = 5944
waitpid(-1, 0xbfffe358, 0) = ? ERESTARTSYS (To be restarted)
--- SIGINT (Interrupt) @ 0 (0) ---
sigreturn() = ? (mask now [CHLD RTMIN])
waitpid(-1, 0xbfffe358, 0) = ? ERESTARTSYS (To be restarted)
--- SIGINT (Interrupt) @ 0 (0) ---
sigreturn() = ? (mask now [CHLD RTMIN])
waitpid(-1,
<snip>
I'm attaching full strace logs.