[I am CCing strace mailing list because even if this turns out to be a kernel bug strace seems to be doing something unexpected - more on that below]
Hi, Aleksa has reported the following lockup when stracing the following go program % cat exec.go package main import ( "os" "syscall" ) func main() { syscall.Exec("/bin/echo", []string{"/bin/echo", "Hello"}, os.Environ()) } $ go version go version go1.6.3 linux/amd64 $ go build -o exec exec.go $ strace -f ./exec [snip] [pid 10349] select(0, NULL, NULL, NULL, {0, 100} <unfinished ...> [pid 10346] <... select resumed> ) = 0 (Timeout) [pid 10346] select(0, NULL, NULL, NULL, {0, 20} <unfinished ...> [pid 10345] execve("/bin/echo", ["/bin/echo", "Hello"], [/* 95 vars */] <unfinished ...> [pid 10346] <... select resumed> ) = 0 (Timeout) [pid 10349] <... select resumed> ) = 0 (Timeout) execve will never finish unless the strace process get killed with SIGKILL. The following was debugged on 3.12 kernel and the current kernel seems to not trigger the issue as easily but I believe the same problem is there as well. The further investigation shown that the tracer (strace) is stuck waiting for cred_guard_mutex [<0000000000000000>] mm_access+0x22/0xa0 [<0000000000000000>] process_vm_rw_core.isra.1+0x112/0x6c0 [<0000000000000000>] process_vm_rw+0xab/0x110 [<0000000000000000>] SyS_process_vm_readv+0x15/0x20 [<0000000000000000>] system_call_fastpath+0x16/0x1b [<0000000000000000>] 0x7f186f031c3a [<0000000000000000>] 0xffffffffffffffff while the traced process (3 threads) are trying to perform the exec. That means that 2 threads are dead (in fact zombies) waiting in their final schedule. Call Trace: [<ffffffff81057c17>] do_exit+0x6f7/0xa70 [<ffffffff81058009>] do_group_exit+0x39/0xa0 [<ffffffff810670d8>] get_signal_to_deliver+0x1e8/0x5c0 [<ffffffff81002342>] do_signal+0x42/0x670 [<ffffffff810029e8>] do_notify_resume+0x78/0xc0 [<ffffffff8151f4c2>] int_signal+0x12/0x17 [<00007f3a33f3ffb9>] 0x7f3a33f3ffb8 and one is Call Trace: [<ffffffff811a80bf>] flush_old_exec+0xdf/0x890 [<ffffffff811f4197>] load_elf_binary+0x307/0xda0 [<ffffffff811a7b8e>] search_binary_handler+0xae/0x1f0 [<ffffffff811a917e>] do_execve_common.isra.26+0x64e/0x810 [<ffffffff811a95c1>] SyS_execve+0x31/0x50 [<ffffffff8151f7b9>] stub_execve+0x69/0xa0 [<00007f3a33f16527>] 0x7f3a33f16526 waiting for notify_count to drop down to 0 while (sig->notify_count) { __set_current_state(TASK_KILLABLE); spin_unlock_irq(lock); schedule(); <<<<<<<<< if (unlikely(__fatal_signal_pending(tsk))) goto killed; spin_lock_irq(lock); } this however doesn't happen because both threads which are dead are zombies waiting to be reaped by the parent and to call their release_task->__exit_signal. The tracer is blocked on the lock held by exec (in prepare_bprm_creds). This is the case in the current kernel as well AFAICS so the same should be possible as well. So is this a bug or something somewhere else makes sure that this will not happen in newer kernels? Now, I've tried to fix this by diff --git a/kernel/fork.c b/kernel/fork.c index 5a57b9bab85c..d5b7c3aea187 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -837,7 +837,7 @@ struct mm_struct *mm_access(struct task_struct *task, unsigned int mode) struct mm_struct *mm; int err; - err = mutex_lock_killable(&task->signal->cred_guard_mutex); + err = mutex_lock_interruptible(&task->signal->cred_guard_mutex); if (err) return ERR_PTR(err); but it didn't help and I really couldn't see why until I sprinkled printks all over the exit path and to learn that the parent (strace) doesn't get notified about SIGCHLD when those threads are dying. Why is that? exit_notify do_notify_parent -> return false __group_send_sig_info send_signal __send_signal prepare_signal -> fails because the signal is ignored this is more than unexpected (at least to me) because I would expect that the strace wouldn't ignore SIGCHLD! Is this a bug in strace? Btw. in a successful (not hung) case the tracer is in do_wait when do_notify_parent->__wake_up_parent does the right thing. Thanks! -- Michal Hocko SUSE Labs