http://qa.mandrakesoft.com/show_bug.cgi?id=4709





------- Additional Comments From [EMAIL PROTECTED]  2003-12-08 18:11 -------
I have seen exactly the same thing with OpenSceneGraph :-( It seems that there  
is a kernel bug in thread signal handling - one thread dies and the rest is  
waiting for it forever.   
  
Some more info about this problem.  
  
It seems that also stock kernel 2.4.22pre10 exhibits this behavior.  
  
Moreover, it looks like one thread dies and then the rest waits forever  
for a signal which never comes :  
  
-----------------------  
  
$ strace -e trace=signal -f osganimate  
rt_sigaction(SIGRTMIN, {0x40544000, [], SA_RESTORER, 0x405e1488}, NULL,  
8) = 0  
rt_sigaction(SIGRT_1, {0x405438c0, [], SA_RESTORER, 0x405e1488}, NULL,  
8) = 0  
rt_sigaction(SIGRT_2, {0x40544040, [], SA_RESTORER, 0x405e1488}, NULL,  
8) = 0  
rt_sigprocmask(SIG_BLOCK, [RTMIN], NULL, 8) = 0  
rt_sigprocmask(SIG_UNBLOCK, [33], NULL, 8) = 0  
Process 5191 attached (waiting for parent)  
Process 5191 resumed (parent 5187 ready)  
[pid  5191] --- SIGSTOP (Stopped (signal)) @ 0 (0) ---  
[pid  5191] rt_sigprocmask(SIG_SETMASK, ~[TRAP 33], NULL, 8) = 0  
[pid  5187] rt_sigprocmask(SIG_SETMASK, NULL, [RTMIN], 8) = 0  
[pid  5187] rt_sigprocmask(SIG_SETMASK, NULL, [RTMIN], 8) = 0  
[pid  5187] rt_sigsuspend([]Process 5192 attached  
~ <unfinished ...>  
[pid  5192] --- SIGSTOP (Stopped (signal)) @ 0 (0) ---  
[pid  5191] kill(5187, SIGRTMIN)        = 0  
[pid  5192] rt_sigprocmask(SIG_SETMASK, [RTMIN], NULL, 8) = 0  
[pid  5187] --- SIGRTMIN (Unknown signal 32) @ 0 (0) ---  
[pid  5192] rt_sigprocmask(SIG_SETMASK, NULL,  <unfinished ...>  
[pid  5187] <... rt_sigsuspend resumed> ) = -1 EINTR (Interrupted system  
call)  
[pid  5192] <... rt_sigprocmask resumed> [RTMIN], 8) = 0  
[pid  5187] sigreturn( <unfinished ...>  
[pid  5192] rt_sigsuspend([] <unfinished ...>  
[pid  5187] <... sigreturn resumed> )   = ? (mask now [RTMIN])  
[pid  5187] rt_sigprocmask(SIG_SETMASK, NULL, [RTMIN], 8) = 0  
[pid  5187] rt_sigprocmask(SIG_SETMASK, NULL, [RTMIN], 8) = 0  
[pid  5187] rt_sigsuspend([]Process 5193 attached (waiting for parent)  
Process 5193 resumed (parent 5191 ready)  
~ <unfinished ...>  
[pid  5193] --- SIGSTOP (Stopped (signal)) @ 0 (0) ---  
[pid  5191] kill(5187, SIGRTMIN)        = 0  
[pid  5193] rt_sigprocmask(SIG_SETMASK, [RTMIN], NULL, 8) = 0  
[pid  5187] --- SIGRTMIN (Unknown signal 32) @ 0 (0) ---  
[pid  5187] <... rt_sigsuspend resumed> ) = -1 EINTR (Interrupted system  
call)  
[pid  5187] sigreturn()                 = ? (mask now [RTMIN])  
[pid  5187] rt_sigprocmask(SIG_SETMASK, NULL, [RTMIN], 8) = 0  
[pid  5187] rt_sigsuspend([] <unfinished ...>  
[pid  5193] rt_sigprocmask(SIG_SETMASK, NULL, [RTMIN], 8) = 0  
[pid  5193] rt_sigprocmask(SIG_SETMASK, NULL, [RTMIN], 8) = 0  
[pid  5193] rt_sigsuspend([]Process 5194 attached (waiting for parent)  
Process 5194 resumed (parent 5191 ready)  
~ <unfinished ...>  
[pid  5194] --- SIGSTOP (Stopped (signal)) @ 0 (0) ---  
[pid  5191] kill(5193, SIGRTMIN <unfinished ...>  
[pid  5193] --- SIGRTMIN (Unknown signal 32) @ 0 (0) ---  
Process 5193 detached  
  
^^^^^^^^^^^^^^^^^^^^^^^^^ thread died ?  
  
[pid  5191] <... kill resumed> )        = 0  
[pid  5194] rt_sigprocmask(SIG_SETMASK, [RTMIN], NULL, 8) = 0  
[pid  5194] kill(5193, SIGRTMIN)        = 0  
[pid  5194] kill(5193, SIGRTMIN)        = 0  
  
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^  
Trying to send a signal to a dead thread, it seems and then it waits  
forever for response.  
  
  
I am not PosixThreads expert, but this looks abnormal to me. Either some  
signal got missing (SIGSEGV, for example, if something crashed and the  
thread manager didn't "notice" the disappearance of one of the threads)  
or there is something funny going on.  
  
  
  

-- 
Configure bugmail: http://qa.mandrakesoft.com/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


------- Reminder: -------
assigned_to: [EMAIL PROTECTED]
status: UNCONFIRMED
creation_date: 
description: 
I originally had this problem on Mandrake 9.1, but thought it could be caused by
specific configuration of that system - I've installed lots of software on it,
and some from Cooker.

Recently I've clean-installed Mandrake 9.2 beta2 (formatting the '/' partition,
and leaving only '/home' intact), and the problem still exists.

The symptoms are: after some time the system is running, it gets impossible t
start new shell processes - bash,csh, any.

All applications that try to launch a shell for some task hang at that stage too
- so non-interactive shells cannot start too.

The shells that are already started are working fine - it's only impossible to
start new ones.

I've run a screen session with multiple windows and waited for the problem to
occur, and then did strace on various programs that launch shells.

They look similarly - they susually hang in waitpit() or read() function.
For example this strace end comes from bash:
<snip>
stat64("/bin/bash", {st_mode=S_IFREG|0755, st_size=641868, ...}) = 0
getpgrp()                               = 5960
rt_sigaction(SIGCHLD, {0x80776a0, [], SA_RESTORER, 0x40051c68}, {SIG_DFL}, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [RTMIN], 8) = 0
fcntl64(0, F_GETFL)                     = 0x2 (flags O_RDWR)
fstat64(0, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 5), ...}) = 0
_llseek(0, 0, 0xbfffefe8, SEEK_CUR)     = -1 ESPIPE (Illegal seek)
rt_sigprocmask(SIG_BLOCK, NULL, [RTMIN], 8) = 0
read(0,  <unfinished ...>
<snip>

and this strace end comes from csh:

<snip>
rt_sigprocmask(SIG_SETMASK, [INT RTMIN], [INT RTMIN], 8) = 0
close(0)                                = -1 EBADF (Bad file descriptor)
dup(19)                                 = 0
fcntl64(0, F_SETFD, 0)                  = 0
close(1)                                = -1 EBADF (Bad file descriptor)
dup(17)                                 = 1
fcntl64(1, F_SETFD, 0)                  = 0
close(2)                                = -1 EBADF (Bad file descriptor)
dup(18)                                 = 2
fcntl64(2, F_SETFD, 0)                  = 0
pipe([5, 6])                            = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [INT RTMIN], 8) = 0
fork()                                  = 5991
gettimeofday({1060688761, 435762}, NULL) = 0
rt_sigprocmask(SIG_SETMASK, [INT RTMIN], [INT CHLD RTMIN], 8) = 0
close(6)                                = 0
read(5,  <unfinished ...>
<snip>

and this strace end comes from xterm:

<snip>
close(4)                                = 0
close(4)                                = -1 EBADF (Bad file descriptor)
rt_sigprocmask(SIG_BLOCK, [INT CHLD], [CHLD RTMIN], 8) = 0
fork()                                  = 5967
rt_sigprocmask(SIG_SETMASK, [CHLD RTMIN], NULL, 8) = 0
close(3)                                = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [CHLD RTMIN], 8) = 0
rt_sigprocmask(SIG_SETMASK, [CHLD RTMIN], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [CHLD RTMIN], 8) = 0
rt_sigaction(SIGINT, {0x8076710, [], SA_RESTORER, 0x40051c68}, {SIG_DFL}, 8) = 0
waitpid(-1, [WIFEXITED(s) && WEXITSTATUS(s) == 0], 0) = 5966
waitpid(-1,  <unfinished ...>
<snip>

The launch operation for programs like xterm, csh can be interrupted with
CTRL-C, but when I run bash - I get the following errors in strace and the
process won't terminate when I press CTRL-C:

<snip>
close(3)                                = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [CHLD RTMIN], 8) = 0
rt_sigprocmask(SIG_SETMASK, [CHLD RTMIN], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [CHLD RTMIN], 8) = 0
rt_sigaction(SIGINT, {0x8076710, [], SA_RESTORER, 0x40051c68}, {0x8087030, [],
SA_RESTORER, 0x40051c68}, 8) = 0
waitpid(-1, [WIFEXITED(s) && WEXITSTATUS(s) == 0], 0) = 5944
waitpid(-1, 0xbfffe358, 0)              = ? ERESTARTSYS (To be restarted)
--- SIGINT (Interrupt) @ 0 (0) ---
sigreturn()                             = ? (mask now [CHLD RTMIN])
waitpid(-1, 0xbfffe358, 0)              = ? ERESTARTSYS (To be restarted)
--- SIGINT (Interrupt) @ 0 (0) ---
sigreturn()                             = ? (mask now [CHLD RTMIN])
waitpid(-1, 
<snip>

I'm attaching full strace logs.

Reply via email to