Re: [Lldb-commits] [PATCH] Profile Assembly Until Ret Instruction

Jason Molenda Mon, 25 Aug 2014 01:42:07 -0700

Hi Tong, I read it over and I think we should commit it and try living on it.  
With unwinding there's usually nothing like real world experience with a change 
to find crazy edge cases. :)


You've been looking at this code more than me lately but I think my initial 
definitions in AssemblyParse_x86,

    int m_machine_ip_regnum;
    int m_machine_sp_regnum;
    int m_machine_fp_regnum;

    int m_lldb_ip_regnum;
    int m_lldb_sp_regnum;
    int m_lldb_fp_regnum;


should be changed to uint32_t's - what do you think?  When I wrote the code 
originally the int v. uint32_t mismatches weren't a big deal but this class has 
picked up a lot of static_cast<uint32_t>'s trying to work around this mismatch. 
 

Thanks for working on this!


Jason


On Aug 22, 2014, at 6:37 PM, Tong Shen <endlessr...@google.com> wrote:

> Hi Jason,
> 
> Updated patch attached.
> 
> Augmentation code is in UnwindAssembly-x86.cpp.
> 
> Current augmentation progress:
> - Get eh_frame for the function.
> - Inspect each instruction:
>     * If there's an FDE for this instruction, use it.
>     * Otherwise, see if this instruction will change how we locate CFA and 
> make corresponding changes.
> 
> The code assume there's prologue description.
> It checks whether there's an FDE for the first instruction; if not, 
> augmentation is aborted and we fallback to assembly profiling.
> 
> I only tested it with i386 epilogue and pc relative addressing.
> If you think it's OK, I will write a complete test.
> 
> Thanks for your time!
> 
> 
> On Fri, Aug 22, 2014 at 9:58 AM, Todd Fiala <tfi...@google.com> wrote:
> Ok - holding off on checking this per communication with Tong.  Will see a 
> new patch later today on this.
> 
> 
> On Fri, Aug 22, 2014 at 9:49 AM, Todd Fiala <tfi...@google.com> wrote:
> Er I'll "get it" in...  eek..
> 
> 
> On Fri, Aug 22, 2014 at 9:49 AM, Todd Fiala <tfi...@google.com> wrote:
> I'm going to test this now.  If it all looks good, I'll ge tit in.
> 
> 
> On Tue, Aug 19, 2014 at 5:01 PM, Tong Shen <endlessr...@google.com> wrote:
> Thanks Jason!
> I will finish this patch and let's see how it goes.
> 
> P.S. I know a little about eh_frame stuff; I added CFI to the new Android 
> ahead-of-time Java compiler so AOT'ed code can properly unwind :-)
> 
> 
> 
> On Tue, Aug 19, 2014 at 4:51 PM, Jason Molenda <jmole...@apple.com> wrote:
> The CIE sets the initial unwind state -- the CIE may describe the unwind 
> state at the first instruction (as it always does with gcc, clang) but in 
> theory it could describe the unwind state once the prologue had executed.
> 
> The idea is that there is one CIE entry which describes a typical 
> at-first-instruction unwind state and then many FDEs that describe the unwind 
> instructions for specific functions - they all use that one CIE.
> 
> Anyway, that's just an implementation detail of eh_frame.  I honestly don't 
> think we should worry about incomplete eh_frame - let's try living on them 
> and see how it works in practice.
> 
> It may be possible to categorize eh_frame to see how complete it is.  
> Compiler-generated x86 prologues are very regular, it would be possible to 
> look at the first few bytes of a function for some pushes or stack pointer 
> changes and see if the eh_frame describes that.  We know what the unwind 
> state is on the first instruction of a function (it's determined by the ABI) 
> -- does the eh_frame have the same instructions?  Can we can through the 
> function for an epilogue, and if we find one, does the eh_frame have unwind 
> instructions there?
> 
> But I don't want to have the perfect be the enemy of the good.  IMO let's 
> take the plunge and try, to use eh_frame and see how that goes.  We can 
> refine it later, or back it out again (it will be a very small change to 
> RegisterContextLLDB) if necessary.
> 
> 
> > On Aug 19, 2014, at 4:41 PM, Tong Shen <endlessr...@google.com> wrote:
> >
> > And for no prologue case:
> > We can detect this easily (any CFI for start address?) and bail out, so we 
> > will fallback to assembly profiler.
> >
> >
> > On Tue, Aug 19, 2014 at 4:36 PM, Tong Shen <endlessr...@google.com> wrote:
> > Ahh sorry I've been working on something else this week and didn't get back 
> > to you in time.
> > And you've been very patient and informative. Thanks!
> >
> > I'm only suggesting it for x86 / x86_64. What I am doing here relies on:
> > - Compiler describes prologue;
> > - We can figure our all mid function CFA changes by inspecting instructions.
> >
> > For frame 0, the new progress for CFA locating will look like this:
> > - Find the nearest CFI available before current PC.
> > - If the CFI is for current PC, viola :-) If not, continue.
> > - Inspect all instructions in between, and make changes to CFA accordingly. 
> > This can solve the PC relative addressing case.
> > - For epilogue, detect if we are in middle of an epilogue. Considering that 
> > there are not many patterns and they are all simple, I think we can 
> > enumerate them and handle accordingly.
> >
> > From what I've seen so far, this actually can solve most of gcc/clang 
> > generated code.
> > For JIT'ed code or hand written assembly, if there's no asynchronous CFI we 
> > are screwed anyway, so trying this won't hurt either (except some extra 
> > running time).\
> >
> > I hope I explain my thoughts clearly.
> >
> > Thank you.
> >
> >
> >
> > On Tue, Aug 19, 2014 at 4:22 PM, Jason Molenda <jmole...@apple.com> wrote:
> > Hi Tong, my message was a little rambling.  Let's be specific.
> >
> > We are changing lldb to trust eh_frame instructions on the 
> > currently-executing aka 0th frame.
> >
> > In practice, gcc and clang eh_frame both describe the prologue, so this is 
> > OK.
> >
> > Old gcc and clang eh_frame do not describe the epilogue.  So we need to add 
> > a pass for i386/x86_64 (at least) to augment the eh_frame-sourced unwind 
> > instructions.  I don't know if it would be best to augment eh_frame 
> > UnwindPlans when we create them in DWARFCallFrameInfo or if it would be 
> > better to do it lazily when we are actually using the unwind instructions 
> > in RegisterContextLLDB (probably RegisterContextLLDB like you were doing).  
> > We should only do it once for a given function, of course.
> >
> > I think it would cleanest if the augmentation function lived in the 
> > UnwindAssembly class.  But I haven't looked how easy it is to get an 
> > UnwindAssembly object where we need it.
> >
> >
> > Thanks for taking this on.  It will be interesting to try living entirely 
> > off eh_frame and see how that works for all the architectures/environments 
> > lldb supports.
> >
> > I worry a little that we're depending on the generous eh_frame from 
> > clang/gcc and if we try to run on icc (Intel's compiler) or something like 
> > that, we may have no prologue instructions and stepping will work very 
> > poorly.  But we'll cross that bridge when we get to it.
> >
> >
> >
> > > On Aug 15, 2014, at 8:07 PM, Jason Molenda <jmole...@apple.com> wrote:
> > >
> > > Hi Tong, sorry for the delay in replying.
> > >
> > > I have a couple thoughts about the patch.  First, the change in 
> > > RegisterContextLLDB::GetFullUnwindPlanForFrame() forces the use of 
> > > eh_frame unwind instructions ("UnwindPlanAtCallSite" - which normally 
> > > means the eh_frame unwind instructions) for the currently-executing aka 
> > > zeroth frame.  We've talked about this before, but it's worth noting that 
> > > this patch includes that change.
> > >
> > > There's still the problem of detecting how *asynchronous* those eh_frame 
> > > unwind instructions are.  For instance, what do you get for an i386 
> > > program that does
> > >
> > > #include <stdio.h>
> > > int main()
> > > {
> > >  puts ("HI");
> > > }
> > >
> > > Most codegen will use a sequence like
> > >
> > >  call LNextInstruction
> > > .LNextInstruction
> > >  pop ebx
> > >
> > > this call & pop sequence is establishing the "pic base", it the program 
> > > will then use that address to find the "HI" constant data.  If you 
> > > compile this -fomit-frame-pointer, so we have to use the stack pointer to 
> > > find the CFA, do the eh_frame instructions describe this?
> > >
> > > It's a bit of an extreme example but it's one of those tricky cases where 
> > > asynchronous ("accurate at every instruction") unwind instructions and 
> > > synchronous ("accurate at places where we can throw an exception, or a 
> > > callee can throw an exception") unwind instructions are different.
> > >
> > >
> > > I would use behaves_like_zeroth_frame instead of if (IsFrameZero()) 
> > > because you can have a frame in the middle of the stack which was the 
> > > zeroth frame when an asynchronous signal came in -- in which case, the 
> > > "callee" stack frame will be sigtramp.
> > >
> > >
> > > You'd want to update the UnwindLogMsgVerbose() text, of course.
> > >
> > >
> > > What your DWARFCallFrameInfo::PatchUnwindPlanForX86() function is doing 
> > > is assuming that the unwind plan fails to include an epilogue 
> > > description, steps through all the instructions in the function looking 
> > > for the epilogue.
> > >
> > > DWARFCallFrameInfo doesn't seem like the right place for this.  There's 
> > > an assumption that the instructions came from eh_frame and that they are 
> > > incomplete.  It seems like it would more naturally live in the 
> > > UnwindAssembly plugin and it would have a name like 
> > > AugmentIncompleteUnwindPlanWithEpilogue or something like that.
> > >
> > > What if the CFI already does describe the epilogue?  I imagine we'll just 
> > > end up with a doubling of UnwindPlan Rows that describe the epilogue 
> > > instructions.
> > >
> > > What if we have a mid-function epilogue?  I've never seen gcc/clang 
> > > generate these for x86, but it's possible.  It's a common code sequence 
> > > on arm/arm64.  You can see a messy bit of code in 
> > > UnwindAssemblyInstEmulation::GetNonCallSiteUnwindPlanFromAssembly which 
> > > handles these -- saving the UnwindPlan's unwind instructions when we see 
> > > the beginning of an epilogue, and once the epilogue is complete, 
> > > restoring the unwind instructions.
> > >
> > >
> > > I'm not opposed to the patch - but it does make the assumption that we're 
> > > going to use eh_frame for the currently executing function and that the 
> > > eh_frame instructions do not include a description of the epilogue.  (and 
> > > that there is only one epilogue in the function).  Mostly I want to call 
> > > all of those aspects out so we're clear what we're talking about here.  
> > > Let's clean it up a bit, put it in and see how it goes.
> > >
> > > J
> > >
> > >
> > >> On Aug 14, 2014, at 6:31 PM, Tong Shen <endlessr...@google.com> wrote:
> > >>
> > >> Hi Jason,
> > >>
> > >> Turns out we still need CFI for frame 0 in certain situations...
> > >>
> > >> A possible approach is to disassemble machine code, and manually adjust 
> > >> CFI for frame 0. For example, if we see "pop ebp; => ret", we set cfa to 
> > >> [esp]; if we see "call next-insn; => pop %ebp", we set cfa_offset+=4.
> > >>
> > >> Patch attached, now it just implements adjustment for "pop ebp; ret".
> > >>
> > >> If you think this approach is OK, I will go ahead and add other 
> > >> tricks(i386 pc relative addressing, more styles of epilogue, etc).
> > >>
> > >> Thank you for your time!
> > >>
> > >>
> > >> On Thu, Jul 31, 2014 at 12:50 PM, Tong Shen <endlessr...@google.com> 
> > >> wrote:
> > >> I think gdb's rationale for using CFI for leaf function is:
> > >> - gcc always generate CFI for progolue, so at function entry, we know 
> > >> the correct CFA;
> > >> - any stack pointer altering operation after that(mid-function & 
> > >> epilogue), we can recognize and handle them.
> > >> So basically, it assumes 2, hacks its way through 3 & 4, and pretends we 
> > >> are at 5.
> > >> Number of hacks we need seems to be small in x86 world, so this 
> > >> tradition is still here.
> > >>
> > >> Here's what gdb does for epilogue: normally when you run 'n', it will 
> > >> run one instruction a time till the next line/different stack id. But 
> > >> when it sees "pop %rbp; ret", it won't step into these instructions. 
> > >> Instead it will execute past them directly.
> > >> I didn't experiment with x86 pc-relative addressing; but I guess it will 
> > >> also recognize and execute past this pattern directly.
> > >>
> > >> So for compiler generated functions, what we do now with assembly parser 
> > >> now can be done with CFI + those gdb hacks.
> > >> And for hand-written assembly, i think CFI is almost always precise at 
> > >> instruction level. In this case, utilizing CFI instead of assembly 
> > >> parser will be a big help.
> > >>
> > >> So maybe we can apply those hacks, and trust CFI only for x86 & x86_64 
> > >> targets?
> > >>
> > >>
> > >> On Thu, Jul 31, 2014 at 12:02 AM, Jason Molenda <jmole...@apple.com> 
> > >> wrote:
> > >> I think we could think of five levels of eh_frame information:
> > >>
> > >>
> > >> 1 unwind instructions at exception throw locations & locations where a 
> > >> callee may throw an exception
> > >>
> > >> 2 unwind instructions that describe the prologue
> > >>
> > >> 3 unwind instructions that describe the epilogue at the end of the 
> > >> function
> > >>
> > >> 4 unwind instructions that describe mid-function epilogues (I see these 
> > >> on arm all the time, don't see them on x86 with compiler generated code 
> > >> - but we don't use eh_frame on arm at Apple, I'm just mentioning it for 
> > >> completeness)
> > >>
> > >> 5 unwind instructions that describe any changes mid-function needed to 
> > >> unwind at all instructions ("asynchronous unwind information")
> > >>
> > >>
> > >> The eh_frame section only guarantees #1.  gcc and clang always do #1 and 
> > >> #2.  Modern gcc's do #3.  I don't know if gcc would do #4 on arm but 
> > >> it's not important, I just mention it for completeness.  And no one does 
> > >> #5 (as far as I know), even in the DWARF debug_frame section.
> > >>
> > >> I think it maybe possible to detect if an eh_frame entry fulfills #3 by 
> > >> looking if the CFA definition on the last row is the same as the initial 
> > >> CFA definition.  But I'm not sure how a debugger could use heuristics to 
> > >> determine much else.
> > >>
> > >>
> > >> In fact, detecting #3 may be the easiest thing to detect.  I'm not sure 
> > >> if the debugger could really detect #2 except maybe if the function had 
> > >> a standard prologue (push rbp, mov rsp rbp) and the eh_frame didn't 
> > >> describe the effects of these instructions, the debugger could know that 
> > >> the eh_frame does not describe the prologue.
> > >>
> > >>
> > >>
> > >>
> > >>> On Jul 30, 2014, at 6:58 PM, Tong Shen <endlessr...@google.com> wrote:
> > >>>
> > >>> Ah I understand now.
> > >>>
> > >>> Now prologue seems always included in CFI fro gcc & clang; and newer 
> > >>> gcc includes epilogue as well.
> > >>> Maybe we can detect and use them when they are available?
> > >>>
> > >>>
> > >>> On Wed, Jul 30, 2014 at 6:44 PM, Jason Molenda <jmole...@apple.com> 
> > >>> wrote:
> > >>> Ah, it looks like gcc changed since I last looked at its eh_frame 
> > >>> output.
> > >>>
> > >>> It's not a bug -- the eh_frame unwind instructions only need to be 
> > >>> accurate at instructions where an exception can be thrown, or where a 
> > >>> callee function can throw an exception.  There's no requirement to 
> > >>> include prologue or epilogue instructions in the eh_frame.
> > >>>
> > >>> And unfortunately from lldb's perspective, when we see eh_frame we'll 
> > >>> never know how descriptive it is.  If it's old-gcc or clang, it won't 
> > >>> include epilogue instructions.  If it's from another compiler, it may 
> > >>> not include any prologue/epilogue instructions at all.
> > >>>
> > >>> Maybe we could look over the UnwindPlan rows and see if the CFA 
> > >>> definition of the last row matches the initial row's CFA definition.  
> > >>> That would show that the epilogue is described.  Unless it is a 
> > >>> tail-call (aka noreturn) function - in which case the stack is never 
> > >>> restored.
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>> On Jul 30, 2014, at 6:32 PM, Tong Shen <endlessr...@google.com> wrote:
> > >>>>
> > >>>> GCC seems to generate a row for epilogue.
> > >>>> Do you think this is a clang bug, or at least a discrepancy between 
> > >>>> clang & gcc?
> > >>>>
> > >>>> Source:
> > >>>> int f() {
> > >>>>      puts("HI\n");
> > >>>>      return 5;
> > >>>> }
> > >>>>
> > >>>> Compile option: only -g
> > >>>>
> > >>>> gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1)
> > >>>> clang version 3.5.0 (213114)
> > >>>>
> > >>>> Env: Ubuntu 14.04, x86_64
> > >>>>
> > >>>> drawfdump -F of clang binary:
> > >>>> <    2><0x00400530:0x00400559><f><fde offset 0x00000088 length: 
> > >>>> 0x0000001c><eh aug data len 0x0>
> > >>>>        0x00400530: <off cfa=08(r7) > <off r16=-8(cfa) >
> > >>>>        0x00400531: <off cfa=16(r7) > <off r6=-16(cfa) > <off 
> > >>>> r16=-8(cfa) >
> > >>>>        0x00400534: <off cfa=16(r6) > <off r6=-16(cfa) > <off 
> > >>>> r16=-8(cfa) >
> > >>>>
> > >>>> drawfdump -F of gcc binary:
> > >>>> <    1><0x0040052d:0x00400542><f><fde offset 0x00000070 length: 
> > >>>> 0x0000001c><eh aug data len 0x0>
> > >>>>        0x0040052d: <off cfa=08(r7) > <off r16=-8(cfa) >
> > >>>>        0x0040052e: <off cfa=16(r7) > <off r6=-16(cfa) > <off 
> > >>>> r16=-8(cfa) >
> > >>>>        0x00400531: <off cfa=16(r6) > <off r6=-16(cfa) > <off 
> > >>>> r16=-8(cfa) >
> > >>>>        0x00400541: <off cfa=08(r7) > <off r6=-16(cfa) > <off 
> > >>>> r16=-8(cfa) >
> > >>>>
> > >>>>
> > >>>> On Wed, Jul 30, 2014 at 5:43 PM, Jason Molenda <jmole...@apple.com> 
> > >>>> wrote:
> > >>>> I'm open to trying to trust eh_frame at frame 0 for x86_64.  The lack 
> > >>>> of epilogue descriptions in eh_frame is the biggest problem here.
> > >>>>
> > >>>> When you "step" or "next" in the debugger, the debugger instruction 
> > >>>> steps across the source line until it gets to the next source line.  
> > >>>> Every time it stops after an instruction step, it confirms that it is 
> > >>>> (1) between the start and end pc values for the source line, and (2) 
> > >>>> that the "stack id" (start address of the function + CFA address) is 
> > >>>> the same.  If it stops and the stack id has changed, for a "next" 
> > >>>> command, it will backtrace one stack frame to see if it stepped into a 
> > >>>> function.  If so, it sets a breakpoint on the return address and 
> > >>>> continues.
> > >>>>
> > >>>> If you switch lldb to prefer eh_frame instructions for x86_64, e.g.
> > >>>>
> > >>>> Index: source/Plugins/Process/Utility/RegisterContextLLDB.cpp
> > >>>> ===================================================================
> > >>>> --- source/Plugins/Process/Utility/RegisterContextLLDB.cpp      
> > >>>> (revision 214344)
> > >>>> +++ source/Plugins/Process/Utility/RegisterContextLLDB.cpp      
> > >>>> (working copy)
> > >>>> @@ -791,6 +791,22 @@
> > >>>>         }
> > >>>>     }
> > >>>>
> > >>>> +    // For x86_64 debugging, let's try using the eh_frame 
> > >>>> instructions even if this is the currently
> > >>>> +    // executing function (frame zero).
> > >>>> +    Target *target = exe_ctx.GetTargetPtr();
> > >>>> +    if (target
> > >>>> +        && (target->GetArchitecture().GetCore() == 
> > >>>> ArchSpec::eCore_x86_64_x86_64h
> > >>>> +            || target->GetArchitecture().GetCore() == 
> > >>>> ArchSpec::eCore_x86_64_x86_64))
> > >>>> +    {
> > >>>> +        unwind_plan_sp = func_unwinders_sp->GetUnwindPlanAtCallSite 
> > >>>> (m_current_offset_backed_up_one);
> > >>>> +        int valid_offset = -1;
> > >>>> +        if (IsUnwindPlanValidForCurrentPC(unwind_plan_sp, 
> > >>>> valid_offset))
> > >>>> +        {
> > >>>> +            UnwindLogMsgVerbose ("frame uses %s for full UnwindPlan, 
> > >>>> preferred over assembly profiling on x86_64", 
> > >>>> unwind_plan_sp->GetSourceName().GetCString());
> > >>>> +            return unwind_plan_sp;
> > >>>> +        }
> > >>>> +    }
> > >>>> +
> > >>>>     // Typically the NonCallSite UnwindPlan is the unwind created by 
> > >>>> inspecting the assembly language instructions
> > >>>>     if (behaves_like_zeroth_frame)
> > >>>>     {
> > >>>>
> > >>>>
> > >>>> you'll find that you have to "next" twice to step out of a function.  
> > >>>> Why?  With a simple function like:
> > >>>>
> > >>>> * thread #1: tid = 0xaf31e, 0x0000000100000eb9 a.out`foo + 25 at 
> > >>>> a.c:5, queue = 'com.apple.main-thread', stop reason = step over
> > >>>>    #0: 0x0000000100000eb9 a.out`foo + 25 at a.c:5
> > >>>>   2    int foo ()
> > >>>>   3    {
> > >>>>   4        puts("HI");
> > >>>> -> 5        return 5;
> > >>>>   6    }
> > >>>>   7
> > >>>>   8    int bar ()
> > >>>> (lldb) disass
> > >>>> a.out`foo at a.c:3:
> > >>>>   0x100000ea0:  pushq  %rbp
> > >>>>   0x100000ea1:  movq   %rsp, %rbp
> > >>>>   0x100000ea4:  subq   $0x10, %rsp
> > >>>>   0x100000ea8:  leaq   0x6b(%rip), %rdi          ; "HI"
> > >>>>   0x100000eaf:  callq  0x100000efa               ; symbol stub for: 
> > >>>> puts
> > >>>>   0x100000eb4:  movl   $0x5, %ecx
> > >>>> -> 0x100000eb9:  movl   %eax, -0x4(%rbp)
> > >>>>   0x100000ebc:  movl   %ecx, %eax
> > >>>>   0x100000ebe:  addq   $0x10, %rsp
> > >>>>   0x100000ec2:  popq   %rbp
> > >>>>   0x100000ec3:  retq
> > >>>>
> > >>>>
> > >>>> if you do "next" lldb will instruction step, comparing the stack ID at 
> > >>>> every stop, until it gets to 0x100000ec3 at which point the stack ID 
> > >>>> will change.  The CFA address (which the eh_frame tells us is rbp+16) 
> > >>>> just changed to the caller's CFA address because we're about to 
> > >>>> return.  The eh_frame instructions really need to tell us that the CFA 
> > >>>> is now rsp+8 at 0x100000ec3.
> > >>>>
> > >>>> The end result is that you need to "next" twice to step out of a 
> > >>>> function.
> > >>>>
> > >>>> AssemblyParse_x86 has a special bit where it looks or the 'ret' 
> > >>>> instruction sequence at the end of the function -
> > >>>>
> > >>>>   // Now look at the byte at the end of the AddressRange for a limited 
> > >>>> attempt at describing the
> > >>>>    // epilogue.  We're looking for the sequence
> > >>>>
> > >>>>    //  [ 0x5d ] mov %rbp, %rsp
> > >>>>    //  [ 0xc3 ] ret
> > >>>>    //  [ 0xe8 xx xx xx xx ] call __stack_chk_fail  (this is sometimes 
> > >>>> the final insn in the function)
> > >>>>
> > >>>>    // We want to add a Row describing how to unwind when we're stopped 
> > >>>> on the 'ret' instruction where the
> > >>>>    // CFA is no longer defined in terms of rbp, but is now defined in 
> > >>>> terms of rsp like on function entry.
> > >>>>
> > >>>>
> > >>>> and adds an extra row of unwind details for that instruction.
> > >>>>
> > >>>>
> > >>>> I mention x86_64 as being a possible good test case here because I 
> > >>>> worry about the i386 picbase sequence (call next-instruction; pop 
> > >>>> $ebx) which occurs a lot.  But for x86_64, my main concern is the 
> > >>>> epilogues.
> > >>>>
> > >>>>
> > >>>>
> > >>>>> On Jul 30, 2014, at 2:52 PM, Tong Shen <endlessr...@google.com> wrote:
> > >>>>>
> > >>>>> Thanks Jason! That's a very informative post, clarify things a lot :-)
> > >>>>>
> > >>>>> Well I have to admit that my patch is specifically for certain kind 
> > >>>>> of functions, and now I see that's not the general case.
> > >>>>>
> > >>>>> I did some experiment with gdb. gdb uses CFI for frame 0, either x86 
> > >>>>> or x86_64. It looks for FDE of frame 0, and do CFA calculations 
> > >>>>> according to that.
> > >>>>>
> > >>>>> - For compiler generated functions: I think there are 2 usage 
> > >>>>> scenarios for frame 0: breakpoint and signal.
> > >>>>>    - Breakpoints are usually at source line boundary instead of 
> > >>>>> instruction boundary, and generally we won't be caught at stack 
> > >>>>> pointer changing locations, so CFI is still valid.
> > >>>>>    - For signal, synchronous unwind table may not be sufficient here. 
> > >>>>> But only stack changing instructions will cause incorrect CFA 
> > >>>>> calculation, so it' not always the case.
> > >>>>> - For hand written assembly functions: from what I've seen, most of 
> > >>>>> the time CFI is present and actually asynchronous.
> > >>>>> So it seems that in most cases, even with only synchronous unwind 
> > >>>>> table, CFI is still correct.
> > >>>>>
> > >>>>> I believe we can trust eh_frame for frame 0 and use assembly 
> > >>>>> profiling as fallback. If both failed, maybe code owner should use 
> > >>>>> -fasynchronous-unwind-tables :-)
> > >>>>>
> > >>>>>
> > >>>>> On Tue, Jul 29, 2014 at 4:59 PM, Jason Molenda <jmole...@apple.com> 
> > >>>>> wrote:
> > >>>>> It was a tricky one and got lost in the shuffle of a busy week.  I 
> > >>>>> was always reluctant to try profiling all the instructions in a 
> > >>>>> function.  On x86, compiler generated code (gcc/clang anyway) is very 
> > >>>>> simplistic about setting up the stack frame at the start and only 
> > >>>>> having one epilogue - so anything fancier risked making mistakes and 
> > >>>>> could possibly have a performance impact as we run functions through 
> > >>>>> the disassembler.
> > >>>>>
> > >>>>> For hand-written assembly functions (which can be very creative with 
> > >>>>> their prologue/epilogue and where it is placed), my position is that 
> > >>>>> they should write eh_frame instructions in their assembly source to 
> > >>>>> tell lldb where to find things.  There is one or two libraries on Mac 
> > >>>>> OS X where we break the "ignore eh_frame for the currently executing 
> > >>>>> function" because there are many hand-written assembly functions in 
> > >>>>> there and the eh_frame is going to beat our own analysis.
> > >>>>>
> > >>>>>
> > >>>>> After I wrote the x86 unwinder, Greg and Caroline implemented the arm 
> > >>>>> unwinder where it emulates every instruction in the function looking 
> > >>>>> for prologue/epilogue instructions.  We haven't seen it having a 
> > >>>>> particularly bad impact performance-wise (lldb only does this 
> > >>>>> disassembly for functions that it finds on stacks during an execution 
> > >>>>> run, and it saves the result so it won't re-compute it for a given 
> > >>>>> function).  The clang armv7 codegen often has mid-function epilogues 
> > >>>>> (early returns) which definitely complicated things and made it 
> > >>>>> necessary to step through the entire function bodies.  There's a 
> > >>>>> bunch of code I added to support these mid-function epilogues - I 
> > >>>>> have to save the register save state when I see an instruction which 
> > >>>>> looks like an epilogue, and when I see the final ret instruction (aka 
> > >>>>> restoring the saved lr contents into pc), I re-install the register 
> > >>>>> save state from before the epilogue started.
> > >>>>>
> > >>>>> These things always make me a little nervous because the instruction 
> > >>>>> analyzer obviously is doing a static analysis so it knows nothing 
> > >>>>> about flow control.  Tong's patch stops when it sees the first CALL 
> > >>>>> instruction - but that's not right, that's just solving the problem 
> > >>>>> for his particular function which doesn't have any CALL instructions 
> > >>>>> before his prologue. :) You could imagine a function which saves a 
> > >>>>> couple of registers, calls another function, then saves a couple more 
> > >>>>> because it needs more scratch registers.
> > >>>>>
> > >>>>> If we're going to change to profiling deep into the function -- and 
> > >>>>> I'm not opposed to doing that, it's been fine on arm -- we should 
> > >>>>> just do the entire function I think.
> > >>>>>
> > >>>>>
> > >>>>> Another alternative would be to trust eh_frame on x86_64 at frame 0.  
> > >>>>> This is one of those things where there's not a great solution.  The 
> > >>>>> unwind instructions in eh_frame are only guaranteed to be accurate 
> > >>>>> for synchronous unwinds -- that is, they are only guaranteed to be 
> > >>>>> accurate at places where an exception could be thrown - at call 
> > >>>>> sites.  So for instances, there's no reason why the compiler has to 
> > >>>>> describe the function prologue instructions at all.  There's no 
> > >>>>> requirement that the eh_frame instructions describe the epilogue 
> > >>>>> instructions.  The information about spilled registers only needs to 
> > >>>>> be emitted where we could throw an exception, or where a callee could 
> > >>>>> throw an exception.
> > >>>>>
> > >>>>> clang/gcc both emit detailed instructions for the prologue setup.  
> > >>>>> But for i386 codegen if the compiler needs to access some pc-relative 
> > >>>>> data, it will do a "call next-instruction; pop %eax" to get the 
> > >>>>> current pc value.  (x86_64 has rip-relative addressing so this isn't 
> > >>>>> needed)  If you're debugging -fomit-frame-pointer code, that means 
> > >>>>> your CFA is expressed in terms of the stack pointer and the stack 
> > >>>>> pointer just changed mid-function --- and eh_frame instructions don't 
> > >>>>> describe this.
> > >>>>>
> > >>>>> The end result: If you want accurate unwinds 100% of the time, you 
> > >>>>> can't rely on the unwind instructions from eh_frame.  But they'll get 
> > >>>>> you accurate unwinds 99.9% of the time ...  also, last I checked, 
> > >>>>> neither clang nor gcc describe the epilogue instructions.
> > >>>>>
> > >>>>>
> > >>>>> In *theory* the unwind instructions from the DWARF debug_frame 
> > >>>>> section should be asynchronous -- they should describe how to find 
> > >>>>> the CFA address for every instruction in the function.  Which makes 
> > >>>>> sense - you want eh_frame to be compact because it's bundled into the 
> > >>>>> executable, so it should only have the information necessary for 
> > >>>>> exception handling and you can put the verbose stuff in debug_frame 
> > >>>>> DWARF for debuggers.  But instead (again, last time I checked), the 
> > >>>>> compilers put the exact same thing in debug_frame even if you use the 
> > >>>>> -fasynchronous-unwind-tables (or whatever that switch was) option.
> > >>>>>
> > >>>>>
> > >>>>> So I don't know, maybe we should just start trusting eh_frame at 
> > >>>>> frame 0 and write off those .1% cases where it isn't correct instead 
> > >>>>> of trying to get too fancy with the assembly analysis code.
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>> On Jul 29, 2014, at 4:17 PM, Todd Fiala <tfi...@google.com> wrote:
> > >>>>>>
> > >>>>>> Hey Jason,
> > >>>>>>
> > >>>>>> Do you have any feedback on this?
> > >>>>>>
> > >>>>>> Thanks!
> > >>>>>>
> > >>>>>> -Todd
> > >>>>>>
> > >>>>>>
> > >>>>>> On Fri, Jul 25, 2014 at 1:42 PM, Tong Shen <endlessr...@google.com> 
> > >>>>>> wrote:
> > >>>>>> Sorry, wrong version of patch...
> > >>>>>>
> > >>>>>>
> > >>>>>> On Fri, Jul 25, 2014 at 1:41 PM, Tong Shen <endlessr...@google.com> 
> > >>>>>> wrote:
> > >>>>>> Hi Molenda, lldb-commits,
> > >>>>>>
> > >>>>>> For now, x86 assembly profiler will stop after 10 "non-prologue" 
> > >>>>>> instructions. In practice it may not be sufficient. For example, we 
> > >>>>>> have a hand-written assembly function, which have hundreds of 
> > >>>>>> instruction before actual (stack-adjusting) prologue instructions.
> > >>>>>>
> > >>>>>> One way is to change the limit to 1000; but there will always be 
> > >>>>>> functions that break the limit :-) I believe the right thing to do 
> > >>>>>> here is parsing all instructions before "ret"/"call" as prologue 
> > >>>>>> instructions.
> > >>>>>>
> > >>>>>> Here's what I changed:
> > >>>>>> - For "push %rbx" and "mov %rbx, -8(%rbp)": only add first row for 
> > >>>>>> that register. They may appear multiple times in function body. But 
> > >>>>>> as long as one of them appears, first appearance should be in 
> > >>>>>> prologue(If it's not in prologue, this function will not use %rbx, 
> > >>>>>> so these 2 instructions should not appear at all).
> > >>>>>> - Also monitor "add %rsp 0x20".
> > >>>>>> - Remove non prologue instruction count.
> > >>>>>> - Add "call" instruction detection, and stop parsing after it.
> > >>>>>>
> > >>>>>> Thanks.
> > >>>>>>
> > >>>>>> --
> > >>>>>> Best Regards, Tong Shen
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> --
> > >>>>>> Best Regards, Tong Shen
> > >>>>>>
> > >>>>>> _______________________________________________
> > >>>>>> lldb-commits mailing list
> > >>>>>> lldb-commits@cs.uiuc.edu
> > >>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/lldb-commits
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> --
> > >>>>>> Todd Fiala |   Software Engineer |     tfi...@google.com |     
> > >>>>>> 650-943-3180
> > >>>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> --
> > >>>>> Best Regards, Tong Shen
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>> --
> > >>>> Best Regards, Tong Shen
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>> Best Regards, Tong Shen
> > >>
> > >>
> > >>
> > >>
> > >> --
> > >> Best Regards, Tong Shen
> > >>
> > >>
> > >>
> > >> --
> > >> Best Regards, Tong Shen
> > >> <adjust_cfi_for_frame_zero.patch>
> > >
> >
> >
> >
> >
> > --
> > Best Regards, Tong Shen
> >
> >
> >
> > --
> > Best Regards, Tong Shen
> 
> 
> 
> 
> -- 
> Best Regards, Tong Shen
> 
> _______________________________________________
> lldb-commits mailing list
> lldb-commits@cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/lldb-commits
> 
> 
> 
> 
> -- 
> Todd Fiala |   Software Engineer |     tfi...@google.com |     650-943-3180
> 
> 
> 
> 
> -- 
> Todd Fiala |   Software Engineer |     tfi...@google.com |     650-943-3180
> 
> 
> 
> 
> -- 
> Todd Fiala |   Software Engineer |     tfi...@google.com |     650-943-3180
> 
> 
> 
> 
> -- 
> Best Regards, Tong Shen
> <augment_eh_frame.patch>_______________________________________________
> lldb-commits mailing list
> lldb-commits@cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/lldb-commits


_______________________________________________
lldb-commits mailing list
lldb-commits@cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/lldb-commits

Re: [Lldb-commits] [PATCH] Profile Assembly Until Ret Instruction

Reply via email to