Re: [Lldb-commits] [PATCH] Profile Assembly Until Ret Instruction

Jason Molenda Tue, 19 Aug 2014 16:33:56 -0700

That will only happen on i386, and only if the code is built 
-fomit-frame-pointer.  And on Mac OS X with the current generation tools, we 
don't emit eh_frame instructions for i386/x86_64 any more.


I'd like to try living off eh_frame and see how it goes.  For i386/x84_64 code, 
if the code uses ebp as the frame pointer (instead of using it as a scratch 
reg), I think it will be fine - my main concern is that eh_frame is not 
guaranteed to describe the prologue or epilogue.  There was enough register 
pressure on i386 that using ebp as a scratch was tempting but x86_64 there's 
little reason to bother.



> On Aug 19, 2014, at 4:29 PM, Greg Clayton <gclay...@apple.com> wrote:
> 
> The EH frame doesn't track the PIC bump stuff and that can/will hose up 
> stepping.
> 
>> On Aug 19, 2014, at 4:22 PM, Jason Molenda <jmole...@apple.com> wrote:
>> 
>> Hi Tong, my message was a little rambling.  Let's be specific.
>> 
>> We are changing lldb to trust eh_frame instructions on the 
>> currently-executing aka 0th frame.
>> 
>> In practice, gcc and clang eh_frame both describe the prologue, so this is 
>> OK.
>> 
>> Old gcc and clang eh_frame do not describe the epilogue.  So we need to add 
>> a pass for i386/x86_64 (at least) to augment the eh_frame-sourced unwind 
>> instructions.  I don't know if it would be best to augment eh_frame 
>> UnwindPlans when we create them in DWARFCallFrameInfo or if it would be 
>> better to do it lazily when we are actually using the unwind instructions in 
>> RegisterContextLLDB (probably RegisterContextLLDB like you were doing).  We 
>> should only do it once for a given function, of course.
>> 
>> I think it would cleanest if the augmentation function lived in the 
>> UnwindAssembly class.  But I haven't looked how easy it is to get an 
>> UnwindAssembly object where we need it.
>> 
>> 
>> Thanks for taking this on.  It will be interesting to try living entirely 
>> off eh_frame and see how that works for all the architectures/environments 
>> lldb supports.
>> 
>> I worry a little that we're depending on the generous eh_frame from 
>> clang/gcc and if we try to run on icc (Intel's compiler) or something like 
>> that, we may have no prologue instructions and stepping will work very 
>> poorly.  But we'll cross that bridge when we get to it.
>> 
>> 
>> 
>>> On Aug 15, 2014, at 8:07 PM, Jason Molenda <jmole...@apple.com> wrote:
>>> 
>>> Hi Tong, sorry for the delay in replying.
>>> 
>>> I have a couple thoughts about the patch.  First, the change in 
>>> RegisterContextLLDB::GetFullUnwindPlanForFrame() forces the use of eh_frame 
>>> unwind instructions ("UnwindPlanAtCallSite" - which normally means the 
>>> eh_frame unwind instructions) for the currently-executing aka zeroth frame. 
>>>  We've talked about this before, but it's worth noting that this patch 
>>> includes that change. 
>>> 
>>> There's still the problem of detecting how *asynchronous* those eh_frame 
>>> unwind instructions are.  For instance, what do you get for an i386 program 
>>> that does
>>> 
>>> #include <stdio.h>
>>> int main()
>>> {
>>> puts ("HI");
>>> }
>>> 
>>> Most codegen will use a sequence like
>>> 
>>> call LNextInstruction
>>> .LNextInstruction
>>> pop ebx
>>> 
>>> this call & pop sequence is establishing the "pic base", it the program 
>>> will then use that address to find the "HI" constant data.  If you compile 
>>> this -fomit-frame-pointer, so we have to use the stack pointer to find the 
>>> CFA, do the eh_frame instructions describe this?
>>> 
>>> It's a bit of an extreme example but it's one of those tricky cases where 
>>> asynchronous ("accurate at every instruction") unwind instructions and 
>>> synchronous ("accurate at places where we can throw an exception, or a 
>>> callee can throw an exception") unwind instructions are different.
>>> 
>>> 
>>> I would use behaves_like_zeroth_frame instead of if (IsFrameZero()) because 
>>> you can have a frame in the middle of the stack which was the zeroth frame 
>>> when an asynchronous signal came in -- in which case, the "callee" stack 
>>> frame will be sigtramp.
>>> 
>>> 
>>> You'd want to update the UnwindLogMsgVerbose() text, of course.
>>> 
>>> 
>>> What your DWARFCallFrameInfo::PatchUnwindPlanForX86() function is doing is 
>>> assuming that the unwind plan fails to include an epilogue description, 
>>> steps through all the instructions in the function looking for the 
>>> epilogue.  
>>> 
>>> DWARFCallFrameInfo doesn't seem like the right place for this.  There's an 
>>> assumption that the instructions came from eh_frame and that they are 
>>> incomplete.  It seems like it would more naturally live in the 
>>> UnwindAssembly plugin and it would have a name like 
>>> AugmentIncompleteUnwindPlanWithEpilogue or something like that.
>>> 
>>> What if the CFI already does describe the epilogue?  I imagine we'll just 
>>> end up with a doubling of UnwindPlan Rows that describe the epilogue 
>>> instructions.
>>> 
>>> What if we have a mid-function epilogue?  I've never seen gcc/clang 
>>> generate these for x86, but it's possible.  It's a common code sequence on 
>>> arm/arm64.  You can see a messy bit of code in 
>>> UnwindAssemblyInstEmulation::GetNonCallSiteUnwindPlanFromAssembly which 
>>> handles these -- saving the UnwindPlan's unwind instructions when we see 
>>> the beginning of an epilogue, and once the epilogue is complete, restoring 
>>> the unwind instructions.
>>> 
>>> 
>>> I'm not opposed to the patch - but it does make the assumption that we're 
>>> going to use eh_frame for the currently executing function and that the 
>>> eh_frame instructions do not include a description of the epilogue.  (and 
>>> that there is only one epilogue in the function).  Mostly I want to call 
>>> all of those aspects out so we're clear what we're talking about here.  
>>> Let's clean it up a bit, put it in and see how it goes.
>>> 
>>> J
>>> 
>>> 
>>>> On Aug 14, 2014, at 6:31 PM, Tong Shen <endlessr...@google.com> wrote:
>>>> 
>>>> Hi Jason,
>>>> 
>>>> Turns out we still need CFI for frame 0 in certain situations...
>>>> 
>>>> A possible approach is to disassemble machine code, and manually adjust 
>>>> CFI for frame 0. For example, if we see "pop ebp; => ret", we set cfa to 
>>>> [esp]; if we see "call next-insn; => pop %ebp", we set cfa_offset+=4.
>>>> 
>>>> Patch attached, now it just implements adjustment for "pop ebp; ret".
>>>> 
>>>> If you think this approach is OK, I will go ahead and add other 
>>>> tricks(i386 pc relative addressing, more styles of epilogue, etc).
>>>> 
>>>> Thank you for your time!
>>>> 
>>>> 
>>>> On Thu, Jul 31, 2014 at 12:50 PM, Tong Shen <endlessr...@google.com> wrote:
>>>> I think gdb's rationale for using CFI for leaf function is:
>>>> - gcc always generate CFI for progolue, so at function entry, we know the 
>>>> correct CFA;
>>>> - any stack pointer altering operation after that(mid-function & 
>>>> epilogue), we can recognize and handle them.
>>>> So basically, it assumes 2, hacks its way through 3 & 4, and pretends we 
>>>> are at 5.
>>>> Number of hacks we need seems to be small in x86 world, so this tradition 
>>>> is still here.
>>>> 
>>>> Here's what gdb does for epilogue: normally when you run 'n', it will run 
>>>> one instruction a time till the next line/different stack id. But when it 
>>>> sees "pop %rbp; ret", it won't step into these instructions. Instead it 
>>>> will execute past them directly.
>>>> I didn't experiment with x86 pc-relative addressing; but I guess it will 
>>>> also recognize and execute past this pattern directly.
>>>> 
>>>> So for compiler generated functions, what we do now with assembly parser 
>>>> now can be done with CFI + those gdb hacks.
>>>> And for hand-written assembly, i think CFI is almost always precise at 
>>>> instruction level. In this case, utilizing CFI instead of assembly parser 
>>>> will be a big help.
>>>> 
>>>> So maybe we can apply those hacks, and trust CFI only for x86 & x86_64 
>>>> targets?
>>>> 
>>>> 
>>>> On Thu, Jul 31, 2014 at 12:02 AM, Jason Molenda <jmole...@apple.com> wrote:
>>>> I think we could think of five levels of eh_frame information:
>>>> 
>>>> 
>>>> 1 unwind instructions at exception throw locations & locations where a 
>>>> callee may throw an exception
>>>> 
>>>> 2 unwind instructions that describe the prologue
>>>> 
>>>> 3 unwind instructions that describe the epilogue at the end of the function
>>>> 
>>>> 4 unwind instructions that describe mid-function epilogues (I see these on 
>>>> arm all the time, don't see them on x86 with compiler generated code - but 
>>>> we don't use eh_frame on arm at Apple, I'm just mentioning it for 
>>>> completeness)
>>>> 
>>>> 5 unwind instructions that describe any changes mid-function needed to 
>>>> unwind at all instructions ("asynchronous unwind information")
>>>> 
>>>> 
>>>> The eh_frame section only guarantees #1.  gcc and clang always do #1 and 
>>>> #2.  Modern gcc's do #3.  I don't know if gcc would do #4 on arm but it's 
>>>> not important, I just mention it for completeness.  And no one does #5 (as 
>>>> far as I know), even in the DWARF debug_frame section.
>>>> 
>>>> I think it maybe possible to detect if an eh_frame entry fulfills #3 by 
>>>> looking if the CFA definition on the last row is the same as the initial 
>>>> CFA definition.  But I'm not sure how a debugger could use heuristics to 
>>>> determine much else.
>>>> 
>>>> 
>>>> In fact, detecting #3 may be the easiest thing to detect.  I'm not sure if 
>>>> the debugger could really detect #2 except maybe if the function had a 
>>>> standard prologue (push rbp, mov rsp rbp) and the eh_frame didn't describe 
>>>> the effects of these instructions, the debugger could know that the 
>>>> eh_frame does not describe the prologue.
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> On Jul 30, 2014, at 6:58 PM, Tong Shen <endlessr...@google.com> wrote:
>>>>> 
>>>>> Ah I understand now.
>>>>> 
>>>>> Now prologue seems always included in CFI fro gcc & clang; and newer gcc 
>>>>> includes epilogue as well.
>>>>> Maybe we can detect and use them when they are available?
>>>>> 
>>>>> 
>>>>> On Wed, Jul 30, 2014 at 6:44 PM, Jason Molenda <jmole...@apple.com> wrote:
>>>>> Ah, it looks like gcc changed since I last looked at its eh_frame output.
>>>>> 
>>>>> It's not a bug -- the eh_frame unwind instructions only need to be 
>>>>> accurate at instructions where an exception can be thrown, or where a 
>>>>> callee function can throw an exception.  There's no requirement to 
>>>>> include prologue or epilogue instructions in the eh_frame.
>>>>> 
>>>>> And unfortunately from lldb's perspective, when we see eh_frame we'll 
>>>>> never know how descriptive it is.  If it's old-gcc or clang, it won't 
>>>>> include epilogue instructions.  If it's from another compiler, it may not 
>>>>> include any prologue/epilogue instructions at all.
>>>>> 
>>>>> Maybe we could look over the UnwindPlan rows and see if the CFA 
>>>>> definition of the last row matches the initial row's CFA definition.  
>>>>> That would show that the epilogue is described.  Unless it is a tail-call 
>>>>> (aka noreturn) function - in which case the stack is never restored.
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>>> On Jul 30, 2014, at 6:32 PM, Tong Shen <endlessr...@google.com> wrote:
>>>>>> 
>>>>>> GCC seems to generate a row for epilogue.
>>>>>> Do you think this is a clang bug, or at least a discrepancy between 
>>>>>> clang & gcc?
>>>>>> 
>>>>>> Source:
>>>>>> int f() {
>>>>>>    puts("HI\n");
>>>>>>    return 5;
>>>>>> }
>>>>>> 
>>>>>> Compile option: only -g
>>>>>> 
>>>>>> gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1)
>>>>>> clang version 3.5.0 (213114)
>>>>>> 
>>>>>> Env: Ubuntu 14.04, x86_64
>>>>>> 
>>>>>> drawfdump -F of clang binary:
>>>>>> <    2><0x00400530:0x00400559><f><fde offset 0x00000088 length: 
>>>>>> 0x0000001c><eh aug data len 0x0>
>>>>>>      0x00400530: <off cfa=08(r7) > <off r16=-8(cfa) >
>>>>>>      0x00400531: <off cfa=16(r7) > <off r6=-16(cfa) > <off r16=-8(cfa) >
>>>>>>      0x00400534: <off cfa=16(r6) > <off r6=-16(cfa) > <off r16=-8(cfa) >
>>>>>> 
>>>>>> drawfdump -F of gcc binary:
>>>>>> <    1><0x0040052d:0x00400542><f><fde offset 0x00000070 length: 
>>>>>> 0x0000001c><eh aug data len 0x0>
>>>>>>      0x0040052d: <off cfa=08(r7) > <off r16=-8(cfa) >
>>>>>>      0x0040052e: <off cfa=16(r7) > <off r6=-16(cfa) > <off r16=-8(cfa) >
>>>>>>      0x00400531: <off cfa=16(r6) > <off r6=-16(cfa) > <off r16=-8(cfa) >
>>>>>>      0x00400541: <off cfa=08(r7) > <off r6=-16(cfa) > <off r16=-8(cfa) >
>>>>>> 
>>>>>> 
>>>>>> On Wed, Jul 30, 2014 at 5:43 PM, Jason Molenda <jmole...@apple.com> 
>>>>>> wrote:
>>>>>> I'm open to trying to trust eh_frame at frame 0 for x86_64.  The lack of 
>>>>>> epilogue descriptions in eh_frame is the biggest problem here.
>>>>>> 
>>>>>> When you "step" or "next" in the debugger, the debugger instruction 
>>>>>> steps across the source line until it gets to the next source line.  
>>>>>> Every time it stops after an instruction step, it confirms that it is 
>>>>>> (1) between the start and end pc values for the source line, and (2) 
>>>>>> that the "stack id" (start address of the function + CFA address) is the 
>>>>>> same.  If it stops and the stack id has changed, for a "next" command, 
>>>>>> it will backtrace one stack frame to see if it stepped into a function.  
>>>>>> If so, it sets a breakpoint on the return address and continues.
>>>>>> 
>>>>>> If you switch lldb to prefer eh_frame instructions for x86_64, e.g.
>>>>>> 
>>>>>> Index: source/Plugins/Process/Utility/RegisterContextLLDB.cpp
>>>>>> ===================================================================
>>>>>> --- source/Plugins/Process/Utility/RegisterContextLLDB.cpp      
>>>>>> (revision 214344)
>>>>>> +++ source/Plugins/Process/Utility/RegisterContextLLDB.cpp      (working 
>>>>>> copy)
>>>>>> @@ -791,6 +791,22 @@
>>>>>>       }
>>>>>>   }
>>>>>> 
>>>>>> +    // For x86_64 debugging, let's try using the eh_frame instructions 
>>>>>> even if this is the currently
>>>>>> +    // executing function (frame zero).
>>>>>> +    Target *target = exe_ctx.GetTargetPtr();
>>>>>> +    if (target
>>>>>> +        && (target->GetArchitecture().GetCore() == 
>>>>>> ArchSpec::eCore_x86_64_x86_64h
>>>>>> +            || target->GetArchitecture().GetCore() == 
>>>>>> ArchSpec::eCore_x86_64_x86_64))
>>>>>> +    {
>>>>>> +        unwind_plan_sp = func_unwinders_sp->GetUnwindPlanAtCallSite 
>>>>>> (m_current_offset_backed_up_one);
>>>>>> +        int valid_offset = -1;
>>>>>> +        if (IsUnwindPlanValidForCurrentPC(unwind_plan_sp, valid_offset))
>>>>>> +        {
>>>>>> +            UnwindLogMsgVerbose ("frame uses %s for full UnwindPlan, 
>>>>>> preferred over assembly profiling on x86_64", 
>>>>>> unwind_plan_sp->GetSourceName().GetCString());
>>>>>> +            return unwind_plan_sp;
>>>>>> +        }
>>>>>> +    }
>>>>>> +
>>>>>>   // Typically the NonCallSite UnwindPlan is the unwind created by 
>>>>>> inspecting the assembly language instructions
>>>>>>   if (behaves_like_zeroth_frame)
>>>>>>   {
>>>>>> 
>>>>>> 
>>>>>> you'll find that you have to "next" twice to step out of a function.  
>>>>>> Why?  With a simple function like:
>>>>>> 
>>>>>> * thread #1: tid = 0xaf31e, 0x0000000100000eb9 a.out`foo + 25 at a.c:5, 
>>>>>> queue = 'com.apple.main-thread', stop reason = step over
>>>>>>  #0: 0x0000000100000eb9 a.out`foo + 25 at a.c:5
>>>>>> 2    int foo ()
>>>>>> 3    {
>>>>>> 4        puts("HI");
>>>>>> -> 5        return 5;
>>>>>> 6    }
>>>>>> 7
>>>>>> 8    int bar ()
>>>>>> (lldb) disass
>>>>>> a.out`foo at a.c:3:
>>>>>> 0x100000ea0:  pushq  %rbp
>>>>>> 0x100000ea1:  movq   %rsp, %rbp
>>>>>> 0x100000ea4:  subq   $0x10, %rsp
>>>>>> 0x100000ea8:  leaq   0x6b(%rip), %rdi          ; "HI"
>>>>>> 0x100000eaf:  callq  0x100000efa               ; symbol stub for: puts
>>>>>> 0x100000eb4:  movl   $0x5, %ecx
>>>>>> -> 0x100000eb9:  movl   %eax, -0x4(%rbp)
>>>>>> 0x100000ebc:  movl   %ecx, %eax
>>>>>> 0x100000ebe:  addq   $0x10, %rsp
>>>>>> 0x100000ec2:  popq   %rbp
>>>>>> 0x100000ec3:  retq
>>>>>> 
>>>>>> 
>>>>>> if you do "next" lldb will instruction step, comparing the stack ID at 
>>>>>> every stop, until it gets to 0x100000ec3 at which point the stack ID 
>>>>>> will change.  The CFA address (which the eh_frame tells us is rbp+16) 
>>>>>> just changed to the caller's CFA address because we're about to return.  
>>>>>> The eh_frame instructions really need to tell us that the CFA is now 
>>>>>> rsp+8 at 0x100000ec3.
>>>>>> 
>>>>>> The end result is that you need to "next" twice to step out of a 
>>>>>> function.
>>>>>> 
>>>>>> AssemblyParse_x86 has a special bit where it looks or the 'ret' 
>>>>>> instruction sequence at the end of the function -
>>>>>> 
>>>>>> // Now look at the byte at the end of the AddressRange for a limited 
>>>>>> attempt at describing the
>>>>>>  // epilogue.  We're looking for the sequence
>>>>>> 
>>>>>>  //  [ 0x5d ] mov %rbp, %rsp
>>>>>>  //  [ 0xc3 ] ret
>>>>>>  //  [ 0xe8 xx xx xx xx ] call __stack_chk_fail  (this is sometimes the 
>>>>>> final insn in the function)
>>>>>> 
>>>>>>  // We want to add a Row describing how to unwind when we're stopped on 
>>>>>> the 'ret' instruction where the
>>>>>>  // CFA is no longer defined in terms of rbp, but is now defined in 
>>>>>> terms of rsp like on function entry.
>>>>>> 
>>>>>> 
>>>>>> and adds an extra row of unwind details for that instruction.
>>>>>> 
>>>>>> 
>>>>>> I mention x86_64 as being a possible good test case here because I worry 
>>>>>> about the i386 picbase sequence (call next-instruction; pop $ebx) which 
>>>>>> occurs a lot.  But for x86_64, my main concern is the epilogues.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On Jul 30, 2014, at 2:52 PM, Tong Shen <endlessr...@google.com> wrote:
>>>>>>> 
>>>>>>> Thanks Jason! That's a very informative post, clarify things a lot :-)
>>>>>>> 
>>>>>>> Well I have to admit that my patch is specifically for certain kind of 
>>>>>>> functions, and now I see that's not the general case.
>>>>>>> 
>>>>>>> I did some experiment with gdb. gdb uses CFI for frame 0, either x86 or 
>>>>>>> x86_64. It looks for FDE of frame 0, and do CFA calculations according 
>>>>>>> to that.
>>>>>>> 
>>>>>>> - For compiler generated functions: I think there are 2 usage scenarios 
>>>>>>> for frame 0: breakpoint and signal.
>>>>>>>  - Breakpoints are usually at source line boundary instead of 
>>>>>>> instruction boundary, and generally we won't be caught at stack pointer 
>>>>>>> changing locations, so CFI is still valid.
>>>>>>>  - For signal, synchronous unwind table may not be sufficient here. But 
>>>>>>> only stack changing instructions will cause incorrect CFA calculation, 
>>>>>>> so it' not always the case.
>>>>>>> - For hand written assembly functions: from what I've seen, most of the 
>>>>>>> time CFI is present and actually asynchronous.
>>>>>>> So it seems that in most cases, even with only synchronous unwind 
>>>>>>> table, CFI is still correct.
>>>>>>> 
>>>>>>> I believe we can trust eh_frame for frame 0 and use assembly profiling 
>>>>>>> as fallback. If both failed, maybe code owner should use 
>>>>>>> -fasynchronous-unwind-tables :-)
>>>>>>> 
>>>>>>> 
>>>>>>> On Tue, Jul 29, 2014 at 4:59 PM, Jason Molenda <jmole...@apple.com> 
>>>>>>> wrote:
>>>>>>> It was a tricky one and got lost in the shuffle of a busy week.  I was 
>>>>>>> always reluctant to try profiling all the instructions in a function.  
>>>>>>> On x86, compiler generated code (gcc/clang anyway) is very simplistic 
>>>>>>> about setting up the stack frame at the start and only having one 
>>>>>>> epilogue - so anything fancier risked making mistakes and could 
>>>>>>> possibly have a performance impact as we run functions through the 
>>>>>>> disassembler.
>>>>>>> 
>>>>>>> For hand-written assembly functions (which can be very creative with 
>>>>>>> their prologue/epilogue and where it is placed), my position is that 
>>>>>>> they should write eh_frame instructions in their assembly source to 
>>>>>>> tell lldb where to find things.  There is one or two libraries on Mac 
>>>>>>> OS X where we break the "ignore eh_frame for the currently executing 
>>>>>>> function" because there are many hand-written assembly functions in 
>>>>>>> there and the eh_frame is going to beat our own analysis.
>>>>>>> 
>>>>>>> 
>>>>>>> After I wrote the x86 unwinder, Greg and Caroline implemented the arm 
>>>>>>> unwinder where it emulates every instruction in the function looking 
>>>>>>> for prologue/epilogue instructions.  We haven't seen it having a 
>>>>>>> particularly bad impact performance-wise (lldb only does this 
>>>>>>> disassembly for functions that it finds on stacks during an execution 
>>>>>>> run, and it saves the result so it won't re-compute it for a given 
>>>>>>> function).  The clang armv7 codegen often has mid-function epilogues 
>>>>>>> (early returns) which definitely complicated things and made it 
>>>>>>> necessary to step through the entire function bodies.  There's a bunch 
>>>>>>> of code I added to support these mid-function epilogues - I have to 
>>>>>>> save the register save state when I see an instruction which looks like 
>>>>>>> an epilogue, and when I see the final ret instruction (aka restoring 
>>>>>>> the saved lr contents into pc), I re-install the register save state 
>>>>>>> from before the epilogue started.
>>>>>>> 
>>>>>>> These things always make me a little nervous because the instruction 
>>>>>>> analyzer obviously is doing a static analysis so it knows nothing about 
>>>>>>> flow control.  Tong's patch stops when it sees the first CALL 
>>>>>>> instruction - but that's not right, that's just solving the problem for 
>>>>>>> his particular function which doesn't have any CALL instructions before 
>>>>>>> his prologue. :) You could imagine a function which saves a couple of 
>>>>>>> registers, calls another function, then saves a couple more because it 
>>>>>>> needs more scratch registers.
>>>>>>> 
>>>>>>> If we're going to change to profiling deep into the function -- and I'm 
>>>>>>> not opposed to doing that, it's been fine on arm -- we should just do 
>>>>>>> the entire function I think.
>>>>>>> 
>>>>>>> 
>>>>>>> Another alternative would be to trust eh_frame on x86_64 at frame 0.  
>>>>>>> This is one of those things where there's not a great solution.  The 
>>>>>>> unwind instructions in eh_frame are only guaranteed to be accurate for 
>>>>>>> synchronous unwinds -- that is, they are only guaranteed to be accurate 
>>>>>>> at places where an exception could be thrown - at call sites.  So for 
>>>>>>> instances, there's no reason why the compiler has to describe the 
>>>>>>> function prologue instructions at all.  There's no requirement that the 
>>>>>>> eh_frame instructions describe the epilogue instructions.  The 
>>>>>>> information about spilled registers only needs to be emitted where we 
>>>>>>> could throw an exception, or where a callee could throw an exception.
>>>>>>> 
>>>>>>> clang/gcc both emit detailed instructions for the prologue setup.  But 
>>>>>>> for i386 codegen if the compiler needs to access some pc-relative data, 
>>>>>>> it will do a "call next-instruction; pop %eax" to get the current pc 
>>>>>>> value.  (x86_64 has rip-relative addressing so this isn't needed)  If 
>>>>>>> you're debugging -fomit-frame-pointer code, that means your CFA is 
>>>>>>> expressed in terms of the stack pointer and the stack pointer just 
>>>>>>> changed mid-function --- and eh_frame instructions don't describe this.
>>>>>>> 
>>>>>>> The end result: If you want accurate unwinds 100% of the time, you 
>>>>>>> can't rely on the unwind instructions from eh_frame.  But they'll get 
>>>>>>> you accurate unwinds 99.9% of the time ...  also, last I checked, 
>>>>>>> neither clang nor gcc describe the epilogue instructions.
>>>>>>> 
>>>>>>> 
>>>>>>> In *theory* the unwind instructions from the DWARF debug_frame section 
>>>>>>> should be asynchronous -- they should describe how to find the CFA 
>>>>>>> address for every instruction in the function.  Which makes sense - you 
>>>>>>> want eh_frame to be compact because it's bundled into the executable, 
>>>>>>> so it should only have the information necessary for exception handling 
>>>>>>> and you can put the verbose stuff in debug_frame DWARF for debuggers.  
>>>>>>> But instead (again, last time I checked), the compilers put the exact 
>>>>>>> same thing in debug_frame even if you use the 
>>>>>>> -fasynchronous-unwind-tables (or whatever that switch was) option.
>>>>>>> 
>>>>>>> 
>>>>>>> So I don't know, maybe we should just start trusting eh_frame at frame 
>>>>>>> 0 and write off those .1% cases where it isn't correct instead of 
>>>>>>> trying to get too fancy with the assembly analysis code.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> On Jul 29, 2014, at 4:17 PM, Todd Fiala <tfi...@google.com> wrote:
>>>>>>>> 
>>>>>>>> Hey Jason,
>>>>>>>> 
>>>>>>>> Do you have any feedback on this?
>>>>>>>> 
>>>>>>>> Thanks!
>>>>>>>> 
>>>>>>>> -Todd
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Fri, Jul 25, 2014 at 1:42 PM, Tong Shen <endlessr...@google.com> 
>>>>>>>> wrote:
>>>>>>>> Sorry, wrong version of patch...
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Fri, Jul 25, 2014 at 1:41 PM, Tong Shen <endlessr...@google.com> 
>>>>>>>> wrote:
>>>>>>>> Hi Molenda, lldb-commits,
>>>>>>>> 
>>>>>>>> For now, x86 assembly profiler will stop after 10 "non-prologue" 
>>>>>>>> instructions. In practice it may not be sufficient. For example, we 
>>>>>>>> have a hand-written assembly function, which have hundreds of 
>>>>>>>> instruction before actual (stack-adjusting) prologue instructions.
>>>>>>>> 
>>>>>>>> One way is to change the limit to 1000; but there will always be 
>>>>>>>> functions that break the limit :-) I believe the right thing to do 
>>>>>>>> here is parsing all instructions before "ret"/"call" as prologue 
>>>>>>>> instructions.
>>>>>>>> 
>>>>>>>> Here's what I changed:
>>>>>>>> - For "push %rbx" and "mov %rbx, -8(%rbp)": only add first row for 
>>>>>>>> that register. They may appear multiple times in function body. But as 
>>>>>>>> long as one of them appears, first appearance should be in prologue(If 
>>>>>>>> it's not in prologue, this function will not use %rbx, so these 2 
>>>>>>>> instructions should not appear at all).
>>>>>>>> - Also monitor "add %rsp 0x20".
>>>>>>>> - Remove non prologue instruction count.
>>>>>>>> - Add "call" instruction detection, and stop parsing after it.
>>>>>>>> 
>>>>>>>> Thanks.
>>>>>>>> 
>>>>>>>> --
>>>>>>>> Best Regards, Tong Shen
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> Best Regards, Tong Shen
>>>>>>>> 
>>>>>>>> _______________________________________________
>>>>>>>> lldb-commits mailing list
>>>>>>>> lldb-commits@cs.uiuc.edu
>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/lldb-commits
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> Todd Fiala |   Software Engineer |     tfi...@google.com |     
>>>>>>>> 650-943-3180
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Best Regards, Tong Shen
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Best Regards, Tong Shen
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Best Regards, Tong Shen
>>>> 
>>>> 
>>>> 
>>>> 
>>>> -- 
>>>> Best Regards, Tong Shen
>>>> 
>>>> 
>>>> 
>>>> -- 
>>>> Best Regards, Tong Shen
>>>> <adjust_cfi_for_frame_zero.patch>
>>> 
>> 
>> _______________________________________________
>> lldb-commits mailing list
>> lldb-commits@cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/lldb-commits
> 

_______________________________________________
lldb-commits mailing list
lldb-commits@cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/lldb-commits

Re: [Lldb-commits] [PATCH] Profile Assembly Until Ret Instruction

Reply via email to