On Tue, Dec 29, 2015 at 12:53:22AM +0100, Mike Belopuhov wrote:
> On 28 December 2015 at 22:22, Josh Grosse <[email protected]> wrote:
> > I'm trying to assist Casey Hancock with illegal instruction exceptions,
> > reported earlier:
> >
> > http://marc.info/?t=145103079400015&r=1&w=2
> > http://marc.info/?t=145111278100001&r=1&w=2
> >
> > But I'm very weak on tracking syscalls through the userland .core files
> > Casey has provided.  I'm not sure if ktrace(1) will add any value to
> > finding the root cause, which I assume is a branch into data, but I have
> > no clear understanding of how to discern where it's happening, and I
> > I could use some guidance, as otherwise it's just the blind leading
> > the blind.
> >
> > At this time, I've provided Casey with a -current release(8) so I have
> > a source tree I can ensure is in sync with executed binaries.  Each
> > failure of a userland program is an illegal instruction, and each time,
> > there's a syscall being executed in frame 0.  I've seen poll(2), kevent(2),
> > waitpid(2), and others, and I am unsure how to -- or if I can -- get any
> > value from the .core files produced.  These appear to be valid stack traces,
> > from the calling frame, as shown below.
> >
> > A cluestick would be very helpful.  I'm sure there's something obvious
> > I'm overlooking.  Thanks in advance!
> >
> 
> forgive me if i've overlooked something, but when faced with a SIGILL,
> you might want to investigate which instruction is executed that
> causes this.  to do this you need to look at program counter in the
> relevant frame so dumping registers and figuring out where does %rip
> point to in the .text segment. please note that %rip value in the frame
> might point to the next instruction.
 
Thank you, Mike.  The frame 0 dissaembly just shows a syscall, which I 
understand is used on amd64 rather than i386's interrupt 0x80.  But
what I know about system calls can be counted with a fist.  In each case
the first frame's rip points to the jump-if-below following the syscall.
I didn't think this was helpful, which is why I thought I'm looking at
the wrong thing in the .core files.  The actuall syscall code paths are
way up in kernel-space, and not in these .core files, to my knowledge.

Three examples:

ntpd: 

0x00000ee8802c4dd0 <poll+0>:    mov    $0xfc,%eax
0x00000ee8802c4dd5 <poll+5>:    mov    %rcx,%r10
0x00000ee8802c4dd8 <poll+8>:    syscall 
0x00000ee8802c4dda <poll+10>:   jb     0xee8802c4dc0 <rand+48>
0x00000ee8802c4ddc <poll+12>:   retq   

sftp:

0x000019bdbe8fe2b0 <read+0>:    mov    $0x3,%eax
0x000019bdbe8fe2b5 <read+5>:    mov    %rcx,%r10
0x000019bdbe8fe2b8 <read+8>:    syscall 
0x000019bdbe8fe2ba <read+10>:   jb     0x19bdbe8fe2a0 <rresvport+16>
0x000019bdbe8fe2bc <read+12>:   retq   

tmux:

0x00000de0c84e8e20 <kevent+0>:  mov    $0x48,%eax
0x00000de0c84e8e25 <kevent+5>:  mov    %rcx,%r10
0x00000de0c84e8e28 <kevent+8>:  syscall 
0x00000de0c84e8e2a <kevent+10>: jb     0xde0c84e8e10 <_libc___p_type+16>
0x00000de0c84e8e2c <kevent+12>: retq   

Reply via email to