On Tue, Dec 29, 2015 at 12:53:22AM +0100, Mike Belopuhov wrote: > On 28 December 2015 at 22:22, Josh Grosse <[email protected]> wrote: > > I'm trying to assist Casey Hancock with illegal instruction exceptions, > > reported earlier: > > > > http://marc.info/?t=145103079400015&r=1&w=2 > > http://marc.info/?t=145111278100001&r=1&w=2 > > > > But I'm very weak on tracking syscalls through the userland .core files > > Casey has provided. I'm not sure if ktrace(1) will add any value to > > finding the root cause, which I assume is a branch into data, but I have > > no clear understanding of how to discern where it's happening, and I > > I could use some guidance, as otherwise it's just the blind leading > > the blind. > > > > At this time, I've provided Casey with a -current release(8) so I have > > a source tree I can ensure is in sync with executed binaries. Each > > failure of a userland program is an illegal instruction, and each time, > > there's a syscall being executed in frame 0. I've seen poll(2), kevent(2), > > waitpid(2), and others, and I am unsure how to -- or if I can -- get any > > value from the .core files produced. These appear to be valid stack traces, > > from the calling frame, as shown below. > > > > A cluestick would be very helpful. I'm sure there's something obvious > > I'm overlooking. Thanks in advance! > > > > forgive me if i've overlooked something, but when faced with a SIGILL, > you might want to investigate which instruction is executed that > causes this. to do this you need to look at program counter in the > relevant frame so dumping registers and figuring out where does %rip > point to in the .text segment. please note that %rip value in the frame > might point to the next instruction. Thank you, Mike. The frame 0 dissaembly just shows a syscall, which I understand is used on amd64 rather than i386's interrupt 0x80. But what I know about system calls can be counted with a fist. In each case the first frame's rip points to the jump-if-below following the syscall. I didn't think this was helpful, which is why I thought I'm looking at the wrong thing in the .core files. The actuall syscall code paths are way up in kernel-space, and not in these .core files, to my knowledge.
Three examples: ntpd: 0x00000ee8802c4dd0 <poll+0>: mov $0xfc,%eax 0x00000ee8802c4dd5 <poll+5>: mov %rcx,%r10 0x00000ee8802c4dd8 <poll+8>: syscall 0x00000ee8802c4dda <poll+10>: jb 0xee8802c4dc0 <rand+48> 0x00000ee8802c4ddc <poll+12>: retq sftp: 0x000019bdbe8fe2b0 <read+0>: mov $0x3,%eax 0x000019bdbe8fe2b5 <read+5>: mov %rcx,%r10 0x000019bdbe8fe2b8 <read+8>: syscall 0x000019bdbe8fe2ba <read+10>: jb 0x19bdbe8fe2a0 <rresvport+16> 0x000019bdbe8fe2bc <read+12>: retq tmux: 0x00000de0c84e8e20 <kevent+0>: mov $0x48,%eax 0x00000de0c84e8e25 <kevent+5>: mov %rcx,%r10 0x00000de0c84e8e28 <kevent+8>: syscall 0x00000de0c84e8e2a <kevent+10>: jb 0xde0c84e8e10 <_libc___p_type+16> 0x00000de0c84e8e2c <kevent+12>: retq
