Quoting Geoffrey Blake <[email protected]>: > What exactly are you trying to do with making a syscall tracer Gabe? I > thought your original problem was a happening with GLIBC doing some bizarre > pointer encryption/decryption and it was getting it wrong leading to a > segmentation fault?
That is the base problem. What I was doing manually was that since the binary is dynamically linked, I was searching for system calls in the Exec trace to find where the dynamic linker had mmapped slabs of init and libc. That was/is the only way to know that something like 0x28faff0ac850 really goes with 0x850 in the linker (or init or libc? I forget). There are patterns, but they're hard to remember and they were actually changing when I tried the small TLB size. Address space randomization seems to be sensitive to small changes in execution, so unless I was just changing the trace flags I'd have to figure out the mappings all over again. Then it occurred to me there was an easy way to automate part of that process which is where this part came from. > > > > To help find that seg fault, I'd suggest going into the kernel and placing > m5_exit() calls in arch/x86/mm/fault.c in the do_page_fault() where the > kernel sends a SIGSEGV to user code and that'll help track down when it > happens the first time, and reduce the cruft that happens after the program > halts, like printing "Segmentation Fault" to the serial port. I'm not sure > a syscall tracer will help with finding the segfault, I have a feeling its > all in glibc and some weird corner case in the ISA of the M5 implementation > that is causing the bug. This version of glibc causing the fault does work > on real hardware correct? I'm assuming it works on real hardware. The image I'm using is the starter file system for Gentoo, so if it didn't work there'd be a lot of annoyed people. What I did was add a trace flag for all faults in x86. Since there are no tlb miss faults, or at least those work different, the only ones that should show up are the page faults. That let me home in on the exact instruction at fault, and then though a lot of painful pattern matching find the C that spawned it. I think there's some sort of address mapping issue because the thing that set the pointer that's being garbled appears to be constructing a linked list for the heap manager. Obviously those two things shouldn't land on top of each other. Fortunately, I found the code that manipulates that as well. Unfortunately I forgot where it was, so I'll have to dumpster dive again. What I'm going to do is to find the actual address used in both cases, fortunately a statically defined global address in the faulting case, and try to figure out which one is in the wrong for using it. It's possible they both are right and the kernel is mistakenly mapping the same physical page to both addresses. In that case it'll be a little more "fun" to figure out, but at least I'll be working with debug information and letting gdb do the heavy lifting. It's also possible that the kernel is mapping everything right and my page table walker or TLB code is tripping things up. Gabe _______________________________________________ m5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/m5-dev
