Hi there, I've finally gotten around to look a serious look into the ppc support in ltrace this week. I'd like to present my conclusions, to let people interested in ltrace know, and to keep this archived for posterity.
First, let's deal with the case of 32-bit PPC. There are two PLT types on 32-bit PPC: old-style, BSS PLT, and new-style "secure" PLT. We can tell one from the other by the flags on the .plt section. If it's +X (executable), it's BSS PLT, otherwise it's secure. BSS PLT works the same way as most architectures: the .plt section contains trampolines and we put breakpoints to those. So this case is trivial to take care of. With secure PLT, the .plt section doesn't contain instructions but addresses. The real PLT table is stored in .text. Addresses of those PLT entries can be computed. I adapted the computation that BFD does some time ago. It was in ltrace-elf.c. On my libs branch (more on this in some other e-mail), I moved it to PPC back end where it belongs. If not prelinked, BSS PLT entries in the .plt section contain zeroes that are overwritten by the dynamic linker during start-up. For that reason, ltrace enables those breakpoints only after .start is hit. This was all rather straightforward (unless I forgot about some corner case that is), and the libs branch currently happily traces PPC32 processes, BSS or secure, prelinked or not, stripped or not, straight or cross. 64-bit PPC case is much more involved. PPC64 only ever has secure PLTs. Right now tracing of PPC64 processes is done by reading the contents of .plt (which contains addresses), and putting breakpoints to those addresses. Thus ltrace traces entry points, not library calls. Intra-library calls fire as well. That's not bad per se, but there is a separate mechanism for that: the -x option. On PPC64, callers call _stubs_, not PLT entries. There may be more than one stub for each PLT entry. Stubs play the role of PLT: read value from .plt and dispatch a call. PLT entries themselves are essentially just curried calls to the resolver, all they do is setup r0 to a number of function to be resolved, and call the resolver. When a symbol is resolved, the resolver updates the value stored in .plt, and so the PLT entry is only ever called once, if at all. Correspondingly, PLT entries are useless as breakpoint sites. When not prelinked, .plt contains zeroes, and we have to wait for the dynamic linker to fill this with addresses of PLT entries. When prelinked, .plt is initialized to contain target addresses right away, and the stub thus completely avoids calling PLT entry. So it seems like stubs are where we would like to have breakpoints. Which is kind of tricky, because there doesn't seem to be any way to reliably compute addresses of stubs. There are symbols for stubs in symbol table, but that goes away when stripped, and therefore can't be used (look for xxxxxxxx.plt_call.name, xxxxxxxx being a hex number). They appear to always be at the beginning of their respective section (typically .text), but I don't know whether that's any rule. It seems it's like that just because I happened to look into binaries that use the standard linker script. Generally they are ordered differently from PLT entries. Figuring out which stub belongs to which PLT entry is possible, but implies assumptions, disassembling, and is a mess. We might still use them if they are in symbol table, but for the general case, something else is necessary. So when a process is started, we do two things: inspect .plt section, and put breakpoints to PLT entries (that we compute like in PPC32). If the binary was prelinked, .plt is filled with addresses. We remember those and rewrite them to point to PLT entries instead. When a call is made, the corresponding PLT breakpoint hits. We don't even bother removing and re-enabling the breakpoint, instead we simply move the IP to the previously-remembered address. The .plt changes should be undone before a process is detached, but even if they are not, the worst that happens is that those symbols are resolved again. If the binary was not prelinked, we need to smuggle a breakpoint somewhere where it hits every time that .plt is updated: a post-fixup breakpoint. One way to do this is to put a brakpoint on .dl_fixup (or .fixup in older linkers, or whatever else in non-standard linkers). When this hits, we remove that breakpoint, and put a breakpoint to the return address instead (read from link register), which will be somewhere in ._dl_runtime_resolve. That is our post-fixup breakpoint. Another way is that we scan ._dl_runtime_resolve and find a "bctr" instruction, and put the breakpoint there. Neither is too nice or robust. Yet another way is to single-step the process through the resolver and watch for changes in .plt. That seems as the most robust solution. When the post-fixup breakpoint hits, we know a .plt slot was updated, and treat it as if it was prelinked, as described above. The whole resolver gambit has to be handled the same way that breakpoint re-enablement in a multi-threaded processes is: when PLT breakpoint hits the first time, we stop all threads before proceeding. Single stepping one thread through the resolver on itself is not enough, because there's a brief window after the value is updated, but before ltrace gets to handle it, and in that window other threads may race through and ltrace misses these calls. When we attach to a running process, .plt will be in a mixed state: some slots were already fixed up, some weren't. We can easily tell the former case from the latter by comparing the address stored herein with address of the corresponding PLT entry. As a bonus, I'll mention one blind alley that I went through. Under this scheme, we would put breakpoints on PLT entries. When these would hit, we would read return address, which leads back to the original caller. Decoding one instruction before the return address would give us slot address, and we would put a breakpoint there. That's tempting, but unfortunately there is generally more than one stub per PLT entry, and a breakpoint would be set only in the first called stub. So this would need to be combined with the post-fixup scheme as well, and would eventually simply collapse to the previous case, except breakpoints would hit in stubs instead of in PLT. As a pleasant side effect, the post-fixup scheme takes care of tracing re-entrant calls, and the case that several threads make the same first-time PLT call in parallel, both of which are currently broken. The deal is that we know that the address in .plt changes after the first call. But we don't get around to updating the breakpoint until the function returns, mostly because we don't know when. Unfortunately this means that we fail to trace any calls done between the first call and first return, as we don't have breakpoints in the right places. I don't see other way to fix this than putting a breakpoint somewhere in dynamic linker, which is what I propose above. Thanks, PM _______________________________________________ Ltrace-devel mailing list [email protected] http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/ltrace-devel
