On 11/02, Dmitry Vyukov wrote: > > On Mon, Nov 2, 2015 at 4:13 PM, Oleg Nesterov <[email protected]> wrote: > > Hi Dmitry, > > > > On 11/02, Dmitry Vyukov wrote: > >> > >> WARNING: CPU: 1 PID: 1 at kernel/signal.c:334 > >> task_participate_group_stop+0x157/0x1d0() > >> Modules linked in: > >> CPU: 1 PID: 1 Comm: init Not tainted 4.3.0 #48 > >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs > >> 01/01/2011 > >> ffffffff82e40280 ffff88003eb0fae0 ffffffff819efe55 0000000000000000 > >> ffff88003eb0fb20 ffffffff810ec871 ffffffff8110f4d7 ffff88003eb00000 > >> ffff88003eb20000 0000000000000000 ffff88003eb0fbf8 ffff88003eb20000 > >> Call Trace: > >> [<ffffffff810eca35>] warn_slowpath_null+0x15/0x20 kernel/panic.c:480 > >> [<ffffffff8110f4d7>] task_participate_group_stop+0x157/0x1d0 > >> kernel/signal.c:334 > >> [<ffffffff81113587>] do_signal_stop+0x1e7/0x6e0 kernel/signal.c:2060 > >> [<ffffffff81116ab7>] get_signal+0x387/0x11b0 kernel/signal.c:2316 > >> [<ffffffff8100cf0d>] do_signal+0x8d/0x19e0 arch/x86/kernel/signal.c:707 > >> [<ffffffff81005d8d>] prepare_exit_to_usermode+0x11d/0x170 > >> arch/x86/entry/common.c:251 > >> [<ffffffff81005e83>] syscall_return_slowpath+0xa3/0x2b0 > >> arch/x86/entry/common.c:317 > >> [<ffffffff82d4f6a7>] int_ret_from_sys_call+0x25/0x8f > >> arch/x86/entry/entry_64.S:281 > >> ---[ end trace f6697fd630b7c361 ]--- > >> > >> > >> The reproducer is (needs to be run as root): > >> > >> // autogenerated by syzkaller (http://github.com/google/syzkaller) > >> #include <sys/ptrace.h> > >> #include <unistd.h> > >> > >> int main() > >> { > >> int pid = 1; > >> ptrace(PTRACE_ATTACH, pid, 0, 0); > >> ptrace(PTRACE_SETOPTIONS, pid, 0, PTRACE_O_EXITKILL); > >> sleep(1); > >> return 0; > >> } > > > > Thanks. > > > > Can't reproduce, but at first glance the problem looks clear... > > Humm... did you run as root?
Yes, > It reproduces all the time on my 4.3 kernel VM. Also firmly killed my > desktop running 3.13. Yes, it kills init and crashes the kernel. But I do not see the warning. > >> Yes, it is weird and it kills init right afterwards. > > > > Could you confirm that this WARN_ON() happens _after_ the reproducer exits? > > > >> But I wasn't able > >> to figure out what's the root cause (why task does not have > >> JOBCTL_STOP_PENDING) and maybe the same WARNING can be triggered > >> without root and/or with other than init process. So still posting it > >> here. > > > > Yes I think you are right. SIGSTOP can race with SIGKILL which (unlike > > SIGCONT) > > doesn't clear JOBCTL_STOP_DEQUEUED/PENDING/etc. > > > > This is mostly fine, the task won't block in TASK_STOPPED if SIGKILL is > > pending, > > but still is not right and leads to the warning above: JOBCTL_STOP_PENDING > > was not > > set because do_signal_stop()->task_set_jobctl_pending() checks > > fatal_signal_pending(). On a second thought, in this particular case (your test-case), SIGSTOP/SIGKILL do not race, although (so far) I think this doesn't matter. JOBCTL_STOP_PENDING comes from __ptrace_unlink() when the tracee already has the pending SIGKILL due to PTRACE_O_EXITKILL. Now. If the tracee (init) wakes up and dequeues SIGKILL before __ptrace_unlink() adds JOBCTL_STOP_PENDING, it won't see JOBCTL_STOP_PENDING and probably this is what happens on my testing machine. Perhaps __ptrace_unlink() should me more carefull too... > > Probably the patch below should fix the problem, but I'd like to think more > > before > > I send the fix. > > > Will test it. Great, thanks. Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/

