Hi , so after a bit more tinkering I got the instrumented binary to look almost good. I think I understand what the next problem is tho. Here is the piece of instrumentation code in my example to explain my self and check my reasoning:
.dyninst:00422014 dd 0 .dyninst:00422018 dd 0 .dyninst:0042201C .dyninst:0042201C ; =============== S U B R O U T I N E ======================================= .dyninst:0042201C .dyninst:0042201C .dyninst:0042201C _main_1 proc near ; CODE XREF: _mainj .dyninst:0042201C .dyninst:0042201C var_14 = dword ptr -14h .dyninst:0042201C var_4 = dword ptr -4 .dyninst:0042201C .dyninst:0042201C lea esp, [esp-14h] .dyninst:00422020 mov [esp+14h+var_4], eax .dyninst:00422024 lea eax, [esp+14h] .dyninst:00422028 and esp, 0FFFFFFF0h .dyninst:0042202B mov [esp+14h+var_14], eax .dyninst:0042202E mov eax, [eax-4] .dyninst:00422031 pusha .dyninst:00422032 push 1010h .dyninst:00422037 push ebp .dyninst:00422038 mov ebp, esp .dyninst:0042203A lea esp, [esp-88h] .dyninst:00422041 call $+5 .dyninst:00422046 .dyninst:00422046 loc_422046: ; DATA XREF: _main_1+5Bw .dyninst:00422046 pop ecx .dyninst:00422047 mov eax, [ecx-32h] .dyninst:0042204A mov edx, [eax] .dyninst:0042204C test edx, edx .dyninst:0042204E jz locret_422085 .dyninst:00422054 mov edx, 0 .dyninst:00422059 mov [eax], edx .dyninst:0042205B mov edx, 0 .dyninst:00422060 push edx .dyninst:00422061 mov [ebp-8], eax .dyninst:00422064 mov [ebp-0Ch], ecx .dyninst:00422067 mov ebx, [ebp-0Ch] .dyninst:0042206A mov eax, [ebx-2Eh] .dyninst:0042206D call eax .... relocated code from main ... .dyninst:0042208B mov esp, [esp+14h+var_14] .dyninst:0042208E push ebp .dyninst:0042208F mov ebp, esp .dyninst:00422091 push offset format ; "hello" .dyninst:00422096 call _printf ... .dyninst:00422100 ; Imports from C:\Program Files\Dyninst\lib\dyninstAPI_RT.dll .dyninst:00422100 ; .dyninst:00422100 DYNINST_bootstrap_info dd ? .dyninst:00422104 align 8 .dyninst:00422108 ; .dyninst:00422108 ; Imports from libInst.dll .dyninst:00422108 ; .dyninst:00422108 incFuncCoverage dd ? In this test example I am using a stripped down version of codeCoverage tool, it instruments the begining of function main which just prints hello world. There are two problems here and I just want to see if my reasoning about them is correct before I start addressing them. The above code is using getpc construction to find itself in the memory to do pc relative addressing and then retrieves the value from [ecx-32h] which in this case will be 00422014. If my reading and crossreferencing of the ELF code is correct, that address is supposed to contain a pointer to DYNINST_bootstrap_info (See PS note). But in this case , at runtime it points to NULL, even tho the import is properly resolved at 00422104. I've tracked down that this reloc is correctly recorded, but I guess it just isn't added to the produced binary in emitWin.C. Is that reasoning correct? The same issue is with "call eax" which is supposed to call the incFuncCoverage instrumentation function and gets the pointer from 00422018 which is , again, NULL, but hte 00422108 import slot is properly resolved at runtime. I guess the reloc info should be added to the binary to solve this, as manually fixing it at runtime via debugger actually makes the instrumentation function execute properly. Another problem, and bigger one as it seems, is the code that is copied from the main function. In this example, the offset to "hello" would clearly require relocation info in the mutated binary. Does dyninst track this info when copying the code, or would that analysis need to be added too? I'm getting to know the codebase pretty well, but there are obviously parts I haven't studied yet, and just wanted to know that I didn't miss anything that is already being done. PS Should it be DYNINST_bootstrap_info or DYNINST_default_tramp_guards? I see the code that is adding DYNINST_default_tramp_guards, but somehow DYNINST_bootstrap_info symbol gets added in the end. Is this ok, or it's a separate bug? Thanks, Aleks On Wed, Feb 25, 2015 at 9:33 PM, Bill Williams <[email protected]> wrote: > On 02/25/2015 02:28 PM, Aleksandar Nikolic wrote: > >> I seem to have tracked down the cause of all my issues, at least >> partialy, to this piece of code in binaryEdit: >> >> base += (1024*1024); >> base -= (base & (1024*1024-1)); >> >> in openFile >> >> Now, this base adjustment clearly has a purpose, but if commented out, >> the instrumented PE file that is produced has a good import table >> and good trampolines to instrumentation code. >> I guess it's required for opening other (non PE) files? >> >> That would be aligning up to the next 1MB boundary, which is an ELF > requirement. If it doesn't hold for PE files, it should be okay to relax > that. > > (Side note: we really ought to add format_elf and format_pe #defines where > applicable, mapped appropriately to relevant OSes. If you feel particularly > motivated, it's fine to use this as the initial motivating test case.) > > > On 02/25/2015 07:59 PM, Bill Williams wrote: >> >>> >>> I'll take a look at the patches over the next couple of days, but this >>>>> all sounds very promising. >>>>> >>>>> I don't have a definite answer for the trampoline issue, but I'd look >>>>> at >>>>> whether there's a similar issue to the one with the imports where we >>>>> generated branches before .dyninst was fixed and didn't recalculate >>>>> them. The springboard code is very good at doing what it's told, so I'd >>>>> strongly suspect that we moved the section of relocated code after we >>>>> generated springboards. >>>>> >>>>> >>>> It would seem that that is the case. If if fix the base address >>>> "manually", it sort of works. As my patch for imports is hacky, is >>>> there a part of the API that does the recalculations or should I do >>>> them myself? >>>> >>>> If memory serves, what we do on Linux is ensure that .dyninstInst is >>> created somewhere fixed (end of the binary, more or less), so that we >>> can actually generate the code correctly at instrumentation time. That's >>> going to be the safest/easiest approach, I think--otherwise we might >>> need to replace 5-byte near branches with longer code sequences and >>> wholly regenerate the section contents. >>> >>> >>>> Cheers, >>>>>> Alex >>>>>> >>>>>> On 02/11/2015 06:20 PM, Matthew LeGendre wrote: >>>>>> >>>>>>> >>>>>>> At one point, perhaps 6-7 years ago, a student had windows binary >>>>>>> rewriting working to the point where you could do basic binary >>>>>>> rewriting >>>>>>> on notepad.exe. They left before finishing the project, and it was >>>>>>> never feature complete nor functional on complicated binaries. >>>>>>> You're >>>>>>> likely seeing the remains of that effort. I don't know how much of >>>>>>> that >>>>>>> code is still valid or useful. >>>>>>> >>>>>>> -Matt >>>>>>> >>>>>>> >>>>>>> On Wed, 11 Feb 2015, Aleksandar Nikolic wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> looking at the codebase, a lot of code seems to already be there. >>>>>>>> I'll be getting to know the code in more details. Any directions >>>>>>>> into what would need to be implemented or what parts are missing? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Alex >>>>>>>> >>>>>>>> On 02/08/2015 10:59 PM, Barton Miller wrote: >>>>>>>> >>>>>>>>> BTW, if there are any individuals or groups that would like to >>>>>>>>> work on >>>>>>>>> getting rewriting to work on Windows, we'd be happy to provide >>>>>>>>> support. >>>>>>>>> Not a small effort but interesting and worthwhile. >>>>>>>>> >>>>>>>>> --bart >>>>>>>>> >>>>>>>>> >>>>>>>>> On 2/6/2015 4:36 PM, Bill Williams wrote: >>>>>>>>> >>>>>>>>>> No, and not exactly. Windows binary rewriting is not supported, >>>>>>>>>> and is >>>>>>>>>> documented as such. If it were to be supported, what you are doing >>>>>>>>>> would work quite reasonably. >>>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Dyninst-api mailing list >>>>>>>>> [email protected] >>>>>>>>> https://lists.cs.wisc.edu/mailman/listinfo/dyninst-api >>>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Dyninst-api mailing list >>>>>>>> [email protected] >>>>>>>> https://lists.cs.wisc.edu/mailman/listinfo/dyninst-api >>>>>>>> >>>>>>>> _______________________________________________ >>>>>> Dyninst-api mailing list >>>>>> [email protected] >>>>>> https://lists.cs.wisc.edu/mailman/listinfo/dyninst-api >>>>>> >>>>>> >>>>> >>>>> >>> >>> > > -- > --bw > > Bill Williams > Paradyn Project > [email protected] >
_______________________________________________ Dyninst-api mailing list [email protected] https://lists.cs.wisc.edu/mailman/listinfo/dyninst-api
