Re: [m5-dev] syscall tracer

gblack Fri, 30 Jan 2009 14:39:06 -0800

Quoting Geoffrey Blake <[email protected]>:

> What exactly are you trying to do with making a syscall tracer Gabe? I
> thought your original problem was a happening with GLIBC doing some bizarre
> pointer encryption/decryption and it was getting it wrong leading to a
> segmentation fault?


That is the base problem. What I was doing manually was that since the binary is
dynamically linked, I was searching for system calls in the Exec trace to find
where the dynamic linker had mmapped slabs of init and libc. That was/is the
only way to know that something like 0x28faff0ac850 really goes with 0x850 in
the linker (or init or libc? I forget). There are patterns, but they're hard to
remember and they were actually changing when I tried the small TLB size.
Address space randomization seems to be sensitive to small changes in
execution, so unless I was just changing the trace flags I'd have to figure out
the mappings all over again. Then it occurred to me there was an easy way to
automate part of that process which is where this part came from.

>
>
>
> To help find that seg fault, I'd suggest going into the kernel and placing
> m5_exit() calls in arch/x86/mm/fault.c in the do_page_fault() where the
> kernel sends a SIGSEGV to user code and that'll help track down when it
> happens the first time, and reduce the cruft that happens after the program
> halts, like printing "Segmentation Fault" to the serial port.  I'm not sure
> a syscall tracer will help with finding the segfault, I have a feeling its
> all in glibc and some weird corner case in the ISA of the M5 implementation
> that is causing the bug.  This version of glibc causing the fault does work
> on real hardware correct?

I'm assuming it works on real hardware. The image I'm using is the starter file
system for Gentoo, so if it didn't work there'd be a lot of annoyed people.
What I did was add a trace flag for all faults in x86. Since there are no tlb
miss faults, or at least those work different, the only ones that should show
up are the page faults. That let me home in on the exact instruction at fault,
and then though a lot of painful pattern matching find the C that spawned it.

I think there's some sort of address mapping issue because the thing that set
the pointer that's being garbled appears to be constructing a linked list for
the heap manager. Obviously those two things shouldn't land on top of each
other. Fortunately, I found the code that manipulates that as well.
Unfortunately I forgot where it was, so I'll have to dumpster dive again. What
I'm going to do is to find the actual address used in both cases, fortunately a
statically defined global address in the faulting case, and try to figure out
which one is in the wrong for using it. It's possible they both are right and
the kernel is mistakenly mapping the same physical page to both addresses. In
that case it'll be a little more "fun" to figure out, but at least I'll be
working with debug information and letting gdb do the heavy lifting. It's also
possible that the kernel is mapping everything right and my page table walker
or TLB code is tripping things up.

Gabe

_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev

Re: [m5-dev] syscall tracer

Reply via email to