Hi ,

so  after a bit more tinkering I got the instrumented binary to look almost
good.
I think I understand what the next problem is tho.
Here is the piece of instrumentation code in my example to explain my self
and check my reasoning:

.dyninst:00422014                 dd 0
.dyninst:00422018                 dd 0
.dyninst:0042201C
.dyninst:0042201C ; =============== S U B R O U T I N E
=======================================
.dyninst:0042201C
.dyninst:0042201C
.dyninst:0042201C _main_1         proc near               ; CODE XREF:
_mainj
.dyninst:0042201C
.dyninst:0042201C var_14          = dword ptr -14h
.dyninst:0042201C var_4           = dword ptr -4
.dyninst:0042201C
.dyninst:0042201C                 lea     esp, [esp-14h]
.dyninst:00422020                 mov     [esp+14h+var_4], eax
.dyninst:00422024                 lea     eax, [esp+14h]
.dyninst:00422028                 and     esp, 0FFFFFFF0h
.dyninst:0042202B                 mov     [esp+14h+var_14], eax
.dyninst:0042202E                 mov     eax, [eax-4]
.dyninst:00422031                 pusha
.dyninst:00422032                 push    1010h
.dyninst:00422037                 push    ebp
.dyninst:00422038                 mov     ebp, esp
.dyninst:0042203A                 lea     esp, [esp-88h]
.dyninst:00422041                 call    $+5
.dyninst:00422046
.dyninst:00422046 loc_422046:                             ; DATA XREF:
_main_1+5Bw
.dyninst:00422046                 pop     ecx
.dyninst:00422047                 mov     eax, [ecx-32h]
.dyninst:0042204A                 mov     edx, [eax]
.dyninst:0042204C                 test    edx, edx
.dyninst:0042204E                 jz      locret_422085
.dyninst:00422054                 mov     edx, 0
.dyninst:00422059                 mov     [eax], edx
.dyninst:0042205B                 mov     edx, 0
.dyninst:00422060                 push    edx
.dyninst:00422061                 mov     [ebp-8], eax
.dyninst:00422064                 mov     [ebp-0Ch], ecx
.dyninst:00422067                 mov     ebx, [ebp-0Ch]
.dyninst:0042206A                 mov     eax, [ebx-2Eh]
.dyninst:0042206D                 call    eax

....
relocated code from main ...
.dyninst:0042208B                 mov     esp, [esp+14h+var_14]
.dyninst:0042208E                 push    ebp
.dyninst:0042208F                 mov     ebp, esp
.dyninst:00422091                 push    offset format   ; "hello"
.dyninst:00422096                 call    _printf
...

.dyninst:00422100 ; Imports from C:\Program
Files\Dyninst\lib\dyninstAPI_RT.dll
.dyninst:00422100 ;
.dyninst:00422100 DYNINST_bootstrap_info dd ?
.dyninst:00422104                 align 8
.dyninst:00422108 ;
.dyninst:00422108 ; Imports from libInst.dll
.dyninst:00422108 ;
.dyninst:00422108 incFuncCoverage dd ?

In this test example I am using a stripped down version of codeCoverage
tool,
it instruments the begining of function main which just prints hello world.

There are two problems here and I just want to see if my reasoning about
them is correct before
I start addressing them.

The above code is using getpc construction to find itself in the memory to
do pc relative addressing
and then retrieves the value from [ecx-32h] which in this case will be
00422014. If my reading
and crossreferencing of the ELF code is correct, that address is supposed
to contain a pointer
to  DYNINST_bootstrap_info (See PS note). But in this case , at runtime
it points to NULL, even tho the import is properly resolved at 00422104.
I've tracked down that this reloc is correctly recorded, but I guess it
just isn't added to the produced
binary in emitWin.C. Is that reasoning correct?
The same issue is with  "call    eax" which is supposed to call the
incFuncCoverage instrumentation
function and gets the pointer from 00422018 which is , again, NULL, but hte
00422108 import slot
is properly resolved at runtime.
I guess the reloc info should be added to the binary to solve this, as
manually fixing it
at runtime via debugger actually makes the instrumentation function execute
properly.

Another problem, and bigger one as it seems, is the code that is copied
from the main function.
In this example, the offset to "hello" would clearly require relocation
info in the mutated binary.
Does dyninst track this info when copying the code, or would that analysis
need to be added too?

I'm getting to know the codebase pretty well, but there are obviously parts
I haven't studied yet,
and just wanted to know that I didn't miss anything that is already being
done.

PS
Should it be DYNINST_bootstrap_info  or DYNINST_default_tramp_guards? I see
the code that is adding
DYNINST_default_tramp_guards, but somehow DYNINST_bootstrap_info symbol
gets added in the end.
 Is this ok, or it's a separate bug?

Thanks,
Aleks

On Wed, Feb 25, 2015 at 9:33 PM, Bill Williams <[email protected]> wrote:

> On 02/25/2015 02:28 PM, Aleksandar Nikolic wrote:
>
>> I seem to have tracked down the cause of all my issues, at least
>> partialy, to this piece of code in binaryEdit:
>>
>> base += (1024*1024);
>> base -= (base & (1024*1024-1));
>>
>> in openFile
>>
>> Now, this base adjustment clearly has a purpose, but if commented out,
>> the instrumented PE file that is produced has a good import table
>> and good trampolines to instrumentation code.
>> I guess it's required for opening other (non PE) files?
>>
>>  That would be aligning up to the next 1MB boundary, which is an ELF
> requirement. If it doesn't hold for PE files, it should be okay to relax
> that.
>
> (Side note: we really ought to add format_elf and format_pe #defines where
> applicable, mapped appropriately to relevant OSes. If you feel particularly
> motivated, it's fine to use this as the initial motivating test case.)
>
>
>  On 02/25/2015 07:59 PM, Bill Williams wrote:
>>
>>>
>>>  I'll take a look at the patches over the next couple of days, but this
>>>>> all sounds very promising.
>>>>>
>>>>> I don't have a definite answer for the trampoline issue, but I'd look
>>>>> at
>>>>> whether there's a similar issue to the one with the imports where we
>>>>> generated branches before .dyninst was fixed and didn't recalculate
>>>>> them. The springboard code is very good at doing what it's told, so I'd
>>>>> strongly suspect that we moved the section of relocated code after we
>>>>> generated springboards.
>>>>>
>>>>>
>>>> It would seem that that is the case. If if fix the base address
>>>> "manually", it sort of works. As my patch for imports is hacky, is
>>>> there a part of the API that does the recalculations or should I do
>>>> them myself?
>>>>
>>>>  If memory serves, what we do on Linux is ensure that .dyninstInst is
>>> created somewhere fixed (end of the binary, more or less), so that we
>>> can actually generate the code correctly at instrumentation time. That's
>>> going to be the safest/easiest approach, I think--otherwise we might
>>> need to replace 5-byte near branches with longer code sequences and
>>> wholly regenerate the section contents.
>>>
>>>
>>>>  Cheers,
>>>>>> Alex
>>>>>>
>>>>>> On 02/11/2015 06:20 PM, Matthew LeGendre wrote:
>>>>>>
>>>>>>>
>>>>>>> At one point, perhaps 6-7 years ago, a student had windows binary
>>>>>>> rewriting working to the point where you could do basic binary
>>>>>>> rewriting
>>>>>>> on notepad.exe.  They left before finishing the project, and it was
>>>>>>> never feature complete nor functional on complicated binaries.
>>>>>>> You're
>>>>>>> likely seeing the remains of that effort.  I don't know how much of
>>>>>>> that
>>>>>>> code is still valid or useful.
>>>>>>>
>>>>>>> -Matt
>>>>>>>
>>>>>>>
>>>>>>> On Wed, 11 Feb 2015, Aleksandar Nikolic wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> looking at the codebase, a lot of code seems to already be there.
>>>>>>>> I'll be getting to know the code in more details. Any directions
>>>>>>>> into what would need to be implemented or what parts are missing?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Alex
>>>>>>>>
>>>>>>>> On 02/08/2015 10:59 PM, Barton Miller wrote:
>>>>>>>>
>>>>>>>>> BTW, if there are any individuals or groups that would like to
>>>>>>>>> work on
>>>>>>>>> getting rewriting to work on Windows, we'd be happy to provide
>>>>>>>>> support.
>>>>>>>>> Not a small effort but interesting and worthwhile.
>>>>>>>>>
>>>>>>>>> --bart
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 2/6/2015 4:36 PM, Bill Williams wrote:
>>>>>>>>>
>>>>>>>>>> No, and not exactly. Windows binary rewriting is not supported,
>>>>>>>>>> and is
>>>>>>>>>> documented as such. If it were to be supported, what you are doing
>>>>>>>>>> would work quite reasonably.
>>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Dyninst-api mailing list
>>>>>>>>> [email protected]
>>>>>>>>> https://lists.cs.wisc.edu/mailman/listinfo/dyninst-api
>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Dyninst-api mailing list
>>>>>>>> [email protected]
>>>>>>>> https://lists.cs.wisc.edu/mailman/listinfo/dyninst-api
>>>>>>>>
>>>>>>>>  _______________________________________________
>>>>>> Dyninst-api mailing list
>>>>>> [email protected]
>>>>>> https://lists.cs.wisc.edu/mailman/listinfo/dyninst-api
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>
>>>
>
> --
> --bw
>
> Bill Williams
> Paradyn Project
> [email protected]
>
_______________________________________________
Dyninst-api mailing list
[email protected]
https://lists.cs.wisc.edu/mailman/listinfo/dyninst-api

Reply via email to