Marcus MERIGHI wrote: > [email protected] (Stefan Kempf), 2016.01.28 (Thu) 06:48 (CET): > > Stuart Henderson wrote: > > > On 2016/01/27 20:10, Stefan Kempf wrote: > > > > So what I suspect to happen is that: > > > > - userland does a syscall > > > > - something goes wrong in the kernel, causing it to call > > > > sigexit(SIGILL), terminating the process > > > > - and the offending instruction you see in the core dump > > > > is the 'syscall' instruction. > > > > > > If this is the case, perhaps ktrace will give clues. > > > > Let's give it a try. > > > > Marcus, can you run this as root, please? > > ktrace /sbin/ping some.domain > > > > Or whatever way you invoked ping that made it crash. > > > > And send us the output of kdump -f ktrace.out? > > # ktrace /sbin/ping 192.168.188.189 > PING 192.168.188.189 (192.168.188.189): 56 data bytes > 64 bytes from 192.168.188.189: icmp_seq=0 ttl=255 time=3.286 ms > Illegal instruction
It's close to my guess. This is how I interpret the end of the output: > # kdump -f ./ktrace.out > [...] > 31378 ping CALL poll(0x7f7ffffd8790,1,INFTIM) > 31378 ping PSIG SIGALRM caught handler=0x15a413b03050 mask=0<> The process blocks in a system call, then a signal wakes it up. Before returning to userspace, sendsig() tries to setup a signal context. Since the ktrace output stops here, sendsig() must have called sigexit(SIGILL). This happens when the kernel is not able to copy the signal context onto the stack of the user process. Some reasons I can think of: the process is at the very bottom of the stack, the stack pointer of the user process is trashed, or: the stack pointer is within the stack area of the process, but it points to a page that was not yet mapped-in, and uvm_fault() fails to fault it in for some reason. Let's see what the stack pointer looks like when you get the illegal instruction. Can you try this please: $ top In a different shell (as root): # procmap <pid of top> We need to see the lines that say [ stack ] Now, back in top, hit ctrl+c to make it crash. Then run: $ gdb -q /usr/bin/top top.core (gdb) info reg And send us the output of the 'info reg' command. > > # gdb -q /usr/sbin/sshd /sshd.core > (no debugging symbols found) > Core was generated by `sshd'. > Program terminated with signal 4, Illegal instruction. > (no debugging symbols found) > Loaded symbols for /usr/sbin/sshd > Reading symbols from /usr/lib/libutil.so.12.1...done. > Loaded symbols for /usr/lib/libutil.so.12.1 > Reading symbols from /usr/lib/libcrypto.so.37.0...done. > Loaded symbols for /usr/lib/libcrypto.so.37.0 > Reading symbols from /usr/lib/libz.so.5.0...done. > Loaded symbols for /usr/lib/libz.so.5.0 > Reading symbols from /usr/lib/libc.so.84.2...done. > Loaded symbols for /usr/lib/libc.so.84.2 > Reading symbols from /usr/libexec/ld.so...done. > Loaded symbols for /usr/libexec/ld.so > #0 0x00000d9b0d57d52a in select () at <stdin>:2 > 2 <stdin>: No such file or directory. > in <stdin> > (gdb) bt > #0 0x00000d9b0d57d52a in select () at <stdin>:2 > #1 0x00000d990c00de91 in sshd_hostkey_sign () from /usr/sbin/sshd > #2 0x00000d990c00b4a1 in ?? () from /usr/sbin/sshd > #3 0x0000000000000000 in ?? () > Current language: auto; currently asm > (gdb) > > > Thanks for looking, Marcus
