Hi Tong, I read it over and I think we should commit it and try living on it. With unwinding there's usually nothing like real world experience with a change to find crazy edge cases. :)
You've been looking at this code more than me lately but I think my initial definitions in AssemblyParse_x86, int m_machine_ip_regnum; int m_machine_sp_regnum; int m_machine_fp_regnum; int m_lldb_ip_regnum; int m_lldb_sp_regnum; int m_lldb_fp_regnum; should be changed to uint32_t's - what do you think? When I wrote the code originally the int v. uint32_t mismatches weren't a big deal but this class has picked up a lot of static_cast<uint32_t>'s trying to work around this mismatch. Thanks for working on this! Jason On Aug 22, 2014, at 6:37 PM, Tong Shen <endlessr...@google.com> wrote: > Hi Jason, > > Updated patch attached. > > Augmentation code is in UnwindAssembly-x86.cpp. > > Current augmentation progress: > - Get eh_frame for the function. > - Inspect each instruction: > * If there's an FDE for this instruction, use it. > * Otherwise, see if this instruction will change how we locate CFA and > make corresponding changes. > > The code assume there's prologue description. > It checks whether there's an FDE for the first instruction; if not, > augmentation is aborted and we fallback to assembly profiling. > > I only tested it with i386 epilogue and pc relative addressing. > If you think it's OK, I will write a complete test. > > Thanks for your time! > > > On Fri, Aug 22, 2014 at 9:58 AM, Todd Fiala <tfi...@google.com> wrote: > Ok - holding off on checking this per communication with Tong. Will see a > new patch later today on this. > > > On Fri, Aug 22, 2014 at 9:49 AM, Todd Fiala <tfi...@google.com> wrote: > Er I'll "get it" in... eek.. > > > On Fri, Aug 22, 2014 at 9:49 AM, Todd Fiala <tfi...@google.com> wrote: > I'm going to test this now. If it all looks good, I'll ge tit in. > > > On Tue, Aug 19, 2014 at 5:01 PM, Tong Shen <endlessr...@google.com> wrote: > Thanks Jason! > I will finish this patch and let's see how it goes. > > P.S. I know a little about eh_frame stuff; I added CFI to the new Android > ahead-of-time Java compiler so AOT'ed code can properly unwind :-) > > > > On Tue, Aug 19, 2014 at 4:51 PM, Jason Molenda <jmole...@apple.com> wrote: > The CIE sets the initial unwind state -- the CIE may describe the unwind > state at the first instruction (as it always does with gcc, clang) but in > theory it could describe the unwind state once the prologue had executed. > > The idea is that there is one CIE entry which describes a typical > at-first-instruction unwind state and then many FDEs that describe the unwind > instructions for specific functions - they all use that one CIE. > > Anyway, that's just an implementation detail of eh_frame. I honestly don't > think we should worry about incomplete eh_frame - let's try living on them > and see how it works in practice. > > It may be possible to categorize eh_frame to see how complete it is. > Compiler-generated x86 prologues are very regular, it would be possible to > look at the first few bytes of a function for some pushes or stack pointer > changes and see if the eh_frame describes that. We know what the unwind > state is on the first instruction of a function (it's determined by the ABI) > -- does the eh_frame have the same instructions? Can we can through the > function for an epilogue, and if we find one, does the eh_frame have unwind > instructions there? > > But I don't want to have the perfect be the enemy of the good. IMO let's > take the plunge and try, to use eh_frame and see how that goes. We can > refine it later, or back it out again (it will be a very small change to > RegisterContextLLDB) if necessary. > > > > On Aug 19, 2014, at 4:41 PM, Tong Shen <endlessr...@google.com> wrote: > > > > And for no prologue case: > > We can detect this easily (any CFI for start address?) and bail out, so we > > will fallback to assembly profiler. > > > > > > On Tue, Aug 19, 2014 at 4:36 PM, Tong Shen <endlessr...@google.com> wrote: > > Ahh sorry I've been working on something else this week and didn't get back > > to you in time. > > And you've been very patient and informative. Thanks! > > > > I'm only suggesting it for x86 / x86_64. What I am doing here relies on: > > - Compiler describes prologue; > > - We can figure our all mid function CFA changes by inspecting instructions. > > > > For frame 0, the new progress for CFA locating will look like this: > > - Find the nearest CFI available before current PC. > > - If the CFI is for current PC, viola :-) If not, continue. > > - Inspect all instructions in between, and make changes to CFA accordingly. > > This can solve the PC relative addressing case. > > - For epilogue, detect if we are in middle of an epilogue. Considering that > > there are not many patterns and they are all simple, I think we can > > enumerate them and handle accordingly. > > > > From what I've seen so far, this actually can solve most of gcc/clang > > generated code. > > For JIT'ed code or hand written assembly, if there's no asynchronous CFI we > > are screwed anyway, so trying this won't hurt either (except some extra > > running time).\ > > > > I hope I explain my thoughts clearly. > > > > Thank you. > > > > > > > > On Tue, Aug 19, 2014 at 4:22 PM, Jason Molenda <jmole...@apple.com> wrote: > > Hi Tong, my message was a little rambling. Let's be specific. > > > > We are changing lldb to trust eh_frame instructions on the > > currently-executing aka 0th frame. > > > > In practice, gcc and clang eh_frame both describe the prologue, so this is > > OK. > > > > Old gcc and clang eh_frame do not describe the epilogue. So we need to add > > a pass for i386/x86_64 (at least) to augment the eh_frame-sourced unwind > > instructions. I don't know if it would be best to augment eh_frame > > UnwindPlans when we create them in DWARFCallFrameInfo or if it would be > > better to do it lazily when we are actually using the unwind instructions > > in RegisterContextLLDB (probably RegisterContextLLDB like you were doing). > > We should only do it once for a given function, of course. > > > > I think it would cleanest if the augmentation function lived in the > > UnwindAssembly class. But I haven't looked how easy it is to get an > > UnwindAssembly object where we need it. > > > > > > Thanks for taking this on. It will be interesting to try living entirely > > off eh_frame and see how that works for all the architectures/environments > > lldb supports. > > > > I worry a little that we're depending on the generous eh_frame from > > clang/gcc and if we try to run on icc (Intel's compiler) or something like > > that, we may have no prologue instructions and stepping will work very > > poorly. But we'll cross that bridge when we get to it. > > > > > > > > > On Aug 15, 2014, at 8:07 PM, Jason Molenda <jmole...@apple.com> wrote: > > > > > > Hi Tong, sorry for the delay in replying. > > > > > > I have a couple thoughts about the patch. First, the change in > > > RegisterContextLLDB::GetFullUnwindPlanForFrame() forces the use of > > > eh_frame unwind instructions ("UnwindPlanAtCallSite" - which normally > > > means the eh_frame unwind instructions) for the currently-executing aka > > > zeroth frame. We've talked about this before, but it's worth noting that > > > this patch includes that change. > > > > > > There's still the problem of detecting how *asynchronous* those eh_frame > > > unwind instructions are. For instance, what do you get for an i386 > > > program that does > > > > > > #include <stdio.h> > > > int main() > > > { > > > puts ("HI"); > > > } > > > > > > Most codegen will use a sequence like > > > > > > call LNextInstruction > > > .LNextInstruction > > > pop ebx > > > > > > this call & pop sequence is establishing the "pic base", it the program > > > will then use that address to find the "HI" constant data. If you > > > compile this -fomit-frame-pointer, so we have to use the stack pointer to > > > find the CFA, do the eh_frame instructions describe this? > > > > > > It's a bit of an extreme example but it's one of those tricky cases where > > > asynchronous ("accurate at every instruction") unwind instructions and > > > synchronous ("accurate at places where we can throw an exception, or a > > > callee can throw an exception") unwind instructions are different. > > > > > > > > > I would use behaves_like_zeroth_frame instead of if (IsFrameZero()) > > > because you can have a frame in the middle of the stack which was the > > > zeroth frame when an asynchronous signal came in -- in which case, the > > > "callee" stack frame will be sigtramp. > > > > > > > > > You'd want to update the UnwindLogMsgVerbose() text, of course. > > > > > > > > > What your DWARFCallFrameInfo::PatchUnwindPlanForX86() function is doing > > > is assuming that the unwind plan fails to include an epilogue > > > description, steps through all the instructions in the function looking > > > for the epilogue. > > > > > > DWARFCallFrameInfo doesn't seem like the right place for this. There's > > > an assumption that the instructions came from eh_frame and that they are > > > incomplete. It seems like it would more naturally live in the > > > UnwindAssembly plugin and it would have a name like > > > AugmentIncompleteUnwindPlanWithEpilogue or something like that. > > > > > > What if the CFI already does describe the epilogue? I imagine we'll just > > > end up with a doubling of UnwindPlan Rows that describe the epilogue > > > instructions. > > > > > > What if we have a mid-function epilogue? I've never seen gcc/clang > > > generate these for x86, but it's possible. It's a common code sequence > > > on arm/arm64. You can see a messy bit of code in > > > UnwindAssemblyInstEmulation::GetNonCallSiteUnwindPlanFromAssembly which > > > handles these -- saving the UnwindPlan's unwind instructions when we see > > > the beginning of an epilogue, and once the epilogue is complete, > > > restoring the unwind instructions. > > > > > > > > > I'm not opposed to the patch - but it does make the assumption that we're > > > going to use eh_frame for the currently executing function and that the > > > eh_frame instructions do not include a description of the epilogue. (and > > > that there is only one epilogue in the function). Mostly I want to call > > > all of those aspects out so we're clear what we're talking about here. > > > Let's clean it up a bit, put it in and see how it goes. > > > > > > J > > > > > > > > >> On Aug 14, 2014, at 6:31 PM, Tong Shen <endlessr...@google.com> wrote: > > >> > > >> Hi Jason, > > >> > > >> Turns out we still need CFI for frame 0 in certain situations... > > >> > > >> A possible approach is to disassemble machine code, and manually adjust > > >> CFI for frame 0. For example, if we see "pop ebp; => ret", we set cfa to > > >> [esp]; if we see "call next-insn; => pop %ebp", we set cfa_offset+=4. > > >> > > >> Patch attached, now it just implements adjustment for "pop ebp; ret". > > >> > > >> If you think this approach is OK, I will go ahead and add other > > >> tricks(i386 pc relative addressing, more styles of epilogue, etc). > > >> > > >> Thank you for your time! > > >> > > >> > > >> On Thu, Jul 31, 2014 at 12:50 PM, Tong Shen <endlessr...@google.com> > > >> wrote: > > >> I think gdb's rationale for using CFI for leaf function is: > > >> - gcc always generate CFI for progolue, so at function entry, we know > > >> the correct CFA; > > >> - any stack pointer altering operation after that(mid-function & > > >> epilogue), we can recognize and handle them. > > >> So basically, it assumes 2, hacks its way through 3 & 4, and pretends we > > >> are at 5. > > >> Number of hacks we need seems to be small in x86 world, so this > > >> tradition is still here. > > >> > > >> Here's what gdb does for epilogue: normally when you run 'n', it will > > >> run one instruction a time till the next line/different stack id. But > > >> when it sees "pop %rbp; ret", it won't step into these instructions. > > >> Instead it will execute past them directly. > > >> I didn't experiment with x86 pc-relative addressing; but I guess it will > > >> also recognize and execute past this pattern directly. > > >> > > >> So for compiler generated functions, what we do now with assembly parser > > >> now can be done with CFI + those gdb hacks. > > >> And for hand-written assembly, i think CFI is almost always precise at > > >> instruction level. In this case, utilizing CFI instead of assembly > > >> parser will be a big help. > > >> > > >> So maybe we can apply those hacks, and trust CFI only for x86 & x86_64 > > >> targets? > > >> > > >> > > >> On Thu, Jul 31, 2014 at 12:02 AM, Jason Molenda <jmole...@apple.com> > > >> wrote: > > >> I think we could think of five levels of eh_frame information: > > >> > > >> > > >> 1 unwind instructions at exception throw locations & locations where a > > >> callee may throw an exception > > >> > > >> 2 unwind instructions that describe the prologue > > >> > > >> 3 unwind instructions that describe the epilogue at the end of the > > >> function > > >> > > >> 4 unwind instructions that describe mid-function epilogues (I see these > > >> on arm all the time, don't see them on x86 with compiler generated code > > >> - but we don't use eh_frame on arm at Apple, I'm just mentioning it for > > >> completeness) > > >> > > >> 5 unwind instructions that describe any changes mid-function needed to > > >> unwind at all instructions ("asynchronous unwind information") > > >> > > >> > > >> The eh_frame section only guarantees #1. gcc and clang always do #1 and > > >> #2. Modern gcc's do #3. I don't know if gcc would do #4 on arm but > > >> it's not important, I just mention it for completeness. And no one does > > >> #5 (as far as I know), even in the DWARF debug_frame section. > > >> > > >> I think it maybe possible to detect if an eh_frame entry fulfills #3 by > > >> looking if the CFA definition on the last row is the same as the initial > > >> CFA definition. But I'm not sure how a debugger could use heuristics to > > >> determine much else. > > >> > > >> > > >> In fact, detecting #3 may be the easiest thing to detect. I'm not sure > > >> if the debugger could really detect #2 except maybe if the function had > > >> a standard prologue (push rbp, mov rsp rbp) and the eh_frame didn't > > >> describe the effects of these instructions, the debugger could know that > > >> the eh_frame does not describe the prologue. > > >> > > >> > > >> > > >> > > >>> On Jul 30, 2014, at 6:58 PM, Tong Shen <endlessr...@google.com> wrote: > > >>> > > >>> Ah I understand now. > > >>> > > >>> Now prologue seems always included in CFI fro gcc & clang; and newer > > >>> gcc includes epilogue as well. > > >>> Maybe we can detect and use them when they are available? > > >>> > > >>> > > >>> On Wed, Jul 30, 2014 at 6:44 PM, Jason Molenda <jmole...@apple.com> > > >>> wrote: > > >>> Ah, it looks like gcc changed since I last looked at its eh_frame > > >>> output. > > >>> > > >>> It's not a bug -- the eh_frame unwind instructions only need to be > > >>> accurate at instructions where an exception can be thrown, or where a > > >>> callee function can throw an exception. There's no requirement to > > >>> include prologue or epilogue instructions in the eh_frame. > > >>> > > >>> And unfortunately from lldb's perspective, when we see eh_frame we'll > > >>> never know how descriptive it is. If it's old-gcc or clang, it won't > > >>> include epilogue instructions. If it's from another compiler, it may > > >>> not include any prologue/epilogue instructions at all. > > >>> > > >>> Maybe we could look over the UnwindPlan rows and see if the CFA > > >>> definition of the last row matches the initial row's CFA definition. > > >>> That would show that the epilogue is described. Unless it is a > > >>> tail-call (aka noreturn) function - in which case the stack is never > > >>> restored. > > >>> > > >>> > > >>> > > >>> > > >>>> On Jul 30, 2014, at 6:32 PM, Tong Shen <endlessr...@google.com> wrote: > > >>>> > > >>>> GCC seems to generate a row for epilogue. > > >>>> Do you think this is a clang bug, or at least a discrepancy between > > >>>> clang & gcc? > > >>>> > > >>>> Source: > > >>>> int f() { > > >>>> puts("HI\n"); > > >>>> return 5; > > >>>> } > > >>>> > > >>>> Compile option: only -g > > >>>> > > >>>> gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1) > > >>>> clang version 3.5.0 (213114) > > >>>> > > >>>> Env: Ubuntu 14.04, x86_64 > > >>>> > > >>>> drawfdump -F of clang binary: > > >>>> < 2><0x00400530:0x00400559><f><fde offset 0x00000088 length: > > >>>> 0x0000001c><eh aug data len 0x0> > > >>>> 0x00400530: <off cfa=08(r7) > <off r16=-8(cfa) > > > >>>> 0x00400531: <off cfa=16(r7) > <off r6=-16(cfa) > <off > > >>>> r16=-8(cfa) > > > >>>> 0x00400534: <off cfa=16(r6) > <off r6=-16(cfa) > <off > > >>>> r16=-8(cfa) > > > >>>> > > >>>> drawfdump -F of gcc binary: > > >>>> < 1><0x0040052d:0x00400542><f><fde offset 0x00000070 length: > > >>>> 0x0000001c><eh aug data len 0x0> > > >>>> 0x0040052d: <off cfa=08(r7) > <off r16=-8(cfa) > > > >>>> 0x0040052e: <off cfa=16(r7) > <off r6=-16(cfa) > <off > > >>>> r16=-8(cfa) > > > >>>> 0x00400531: <off cfa=16(r6) > <off r6=-16(cfa) > <off > > >>>> r16=-8(cfa) > > > >>>> 0x00400541: <off cfa=08(r7) > <off r6=-16(cfa) > <off > > >>>> r16=-8(cfa) > > > >>>> > > >>>> > > >>>> On Wed, Jul 30, 2014 at 5:43 PM, Jason Molenda <jmole...@apple.com> > > >>>> wrote: > > >>>> I'm open to trying to trust eh_frame at frame 0 for x86_64. The lack > > >>>> of epilogue descriptions in eh_frame is the biggest problem here. > > >>>> > > >>>> When you "step" or "next" in the debugger, the debugger instruction > > >>>> steps across the source line until it gets to the next source line. > > >>>> Every time it stops after an instruction step, it confirms that it is > > >>>> (1) between the start and end pc values for the source line, and (2) > > >>>> that the "stack id" (start address of the function + CFA address) is > > >>>> the same. If it stops and the stack id has changed, for a "next" > > >>>> command, it will backtrace one stack frame to see if it stepped into a > > >>>> function. If so, it sets a breakpoint on the return address and > > >>>> continues. > > >>>> > > >>>> If you switch lldb to prefer eh_frame instructions for x86_64, e.g. > > >>>> > > >>>> Index: source/Plugins/Process/Utility/RegisterContextLLDB.cpp > > >>>> =================================================================== > > >>>> --- source/Plugins/Process/Utility/RegisterContextLLDB.cpp > > >>>> (revision 214344) > > >>>> +++ source/Plugins/Process/Utility/RegisterContextLLDB.cpp > > >>>> (working copy) > > >>>> @@ -791,6 +791,22 @@ > > >>>> } > > >>>> } > > >>>> > > >>>> + // For x86_64 debugging, let's try using the eh_frame > > >>>> instructions even if this is the currently > > >>>> + // executing function (frame zero). > > >>>> + Target *target = exe_ctx.GetTargetPtr(); > > >>>> + if (target > > >>>> + && (target->GetArchitecture().GetCore() == > > >>>> ArchSpec::eCore_x86_64_x86_64h > > >>>> + || target->GetArchitecture().GetCore() == > > >>>> ArchSpec::eCore_x86_64_x86_64)) > > >>>> + { > > >>>> + unwind_plan_sp = func_unwinders_sp->GetUnwindPlanAtCallSite > > >>>> (m_current_offset_backed_up_one); > > >>>> + int valid_offset = -1; > > >>>> + if (IsUnwindPlanValidForCurrentPC(unwind_plan_sp, > > >>>> valid_offset)) > > >>>> + { > > >>>> + UnwindLogMsgVerbose ("frame uses %s for full UnwindPlan, > > >>>> preferred over assembly profiling on x86_64", > > >>>> unwind_plan_sp->GetSourceName().GetCString()); > > >>>> + return unwind_plan_sp; > > >>>> + } > > >>>> + } > > >>>> + > > >>>> // Typically the NonCallSite UnwindPlan is the unwind created by > > >>>> inspecting the assembly language instructions > > >>>> if (behaves_like_zeroth_frame) > > >>>> { > > >>>> > > >>>> > > >>>> you'll find that you have to "next" twice to step out of a function. > > >>>> Why? With a simple function like: > > >>>> > > >>>> * thread #1: tid = 0xaf31e, 0x0000000100000eb9 a.out`foo + 25 at > > >>>> a.c:5, queue = 'com.apple.main-thread', stop reason = step over > > >>>> #0: 0x0000000100000eb9 a.out`foo + 25 at a.c:5 > > >>>> 2 int foo () > > >>>> 3 { > > >>>> 4 puts("HI"); > > >>>> -> 5 return 5; > > >>>> 6 } > > >>>> 7 > > >>>> 8 int bar () > > >>>> (lldb) disass > > >>>> a.out`foo at a.c:3: > > >>>> 0x100000ea0: pushq %rbp > > >>>> 0x100000ea1: movq %rsp, %rbp > > >>>> 0x100000ea4: subq $0x10, %rsp > > >>>> 0x100000ea8: leaq 0x6b(%rip), %rdi ; "HI" > > >>>> 0x100000eaf: callq 0x100000efa ; symbol stub for: > > >>>> puts > > >>>> 0x100000eb4: movl $0x5, %ecx > > >>>> -> 0x100000eb9: movl %eax, -0x4(%rbp) > > >>>> 0x100000ebc: movl %ecx, %eax > > >>>> 0x100000ebe: addq $0x10, %rsp > > >>>> 0x100000ec2: popq %rbp > > >>>> 0x100000ec3: retq > > >>>> > > >>>> > > >>>> if you do "next" lldb will instruction step, comparing the stack ID at > > >>>> every stop, until it gets to 0x100000ec3 at which point the stack ID > > >>>> will change. The CFA address (which the eh_frame tells us is rbp+16) > > >>>> just changed to the caller's CFA address because we're about to > > >>>> return. The eh_frame instructions really need to tell us that the CFA > > >>>> is now rsp+8 at 0x100000ec3. > > >>>> > > >>>> The end result is that you need to "next" twice to step out of a > > >>>> function. > > >>>> > > >>>> AssemblyParse_x86 has a special bit where it looks or the 'ret' > > >>>> instruction sequence at the end of the function - > > >>>> > > >>>> // Now look at the byte at the end of the AddressRange for a limited > > >>>> attempt at describing the > > >>>> // epilogue. We're looking for the sequence > > >>>> > > >>>> // [ 0x5d ] mov %rbp, %rsp > > >>>> // [ 0xc3 ] ret > > >>>> // [ 0xe8 xx xx xx xx ] call __stack_chk_fail (this is sometimes > > >>>> the final insn in the function) > > >>>> > > >>>> // We want to add a Row describing how to unwind when we're stopped > > >>>> on the 'ret' instruction where the > > >>>> // CFA is no longer defined in terms of rbp, but is now defined in > > >>>> terms of rsp like on function entry. > > >>>> > > >>>> > > >>>> and adds an extra row of unwind details for that instruction. > > >>>> > > >>>> > > >>>> I mention x86_64 as being a possible good test case here because I > > >>>> worry about the i386 picbase sequence (call next-instruction; pop > > >>>> $ebx) which occurs a lot. But for x86_64, my main concern is the > > >>>> epilogues. > > >>>> > > >>>> > > >>>> > > >>>>> On Jul 30, 2014, at 2:52 PM, Tong Shen <endlessr...@google.com> wrote: > > >>>>> > > >>>>> Thanks Jason! That's a very informative post, clarify things a lot :-) > > >>>>> > > >>>>> Well I have to admit that my patch is specifically for certain kind > > >>>>> of functions, and now I see that's not the general case. > > >>>>> > > >>>>> I did some experiment with gdb. gdb uses CFI for frame 0, either x86 > > >>>>> or x86_64. It looks for FDE of frame 0, and do CFA calculations > > >>>>> according to that. > > >>>>> > > >>>>> - For compiler generated functions: I think there are 2 usage > > >>>>> scenarios for frame 0: breakpoint and signal. > > >>>>> - Breakpoints are usually at source line boundary instead of > > >>>>> instruction boundary, and generally we won't be caught at stack > > >>>>> pointer changing locations, so CFI is still valid. > > >>>>> - For signal, synchronous unwind table may not be sufficient here. > > >>>>> But only stack changing instructions will cause incorrect CFA > > >>>>> calculation, so it' not always the case. > > >>>>> - For hand written assembly functions: from what I've seen, most of > > >>>>> the time CFI is present and actually asynchronous. > > >>>>> So it seems that in most cases, even with only synchronous unwind > > >>>>> table, CFI is still correct. > > >>>>> > > >>>>> I believe we can trust eh_frame for frame 0 and use assembly > > >>>>> profiling as fallback. If both failed, maybe code owner should use > > >>>>> -fasynchronous-unwind-tables :-) > > >>>>> > > >>>>> > > >>>>> On Tue, Jul 29, 2014 at 4:59 PM, Jason Molenda <jmole...@apple.com> > > >>>>> wrote: > > >>>>> It was a tricky one and got lost in the shuffle of a busy week. I > > >>>>> was always reluctant to try profiling all the instructions in a > > >>>>> function. On x86, compiler generated code (gcc/clang anyway) is very > > >>>>> simplistic about setting up the stack frame at the start and only > > >>>>> having one epilogue - so anything fancier risked making mistakes and > > >>>>> could possibly have a performance impact as we run functions through > > >>>>> the disassembler. > > >>>>> > > >>>>> For hand-written assembly functions (which can be very creative with > > >>>>> their prologue/epilogue and where it is placed), my position is that > > >>>>> they should write eh_frame instructions in their assembly source to > > >>>>> tell lldb where to find things. There is one or two libraries on Mac > > >>>>> OS X where we break the "ignore eh_frame for the currently executing > > >>>>> function" because there are many hand-written assembly functions in > > >>>>> there and the eh_frame is going to beat our own analysis. > > >>>>> > > >>>>> > > >>>>> After I wrote the x86 unwinder, Greg and Caroline implemented the arm > > >>>>> unwinder where it emulates every instruction in the function looking > > >>>>> for prologue/epilogue instructions. We haven't seen it having a > > >>>>> particularly bad impact performance-wise (lldb only does this > > >>>>> disassembly for functions that it finds on stacks during an execution > > >>>>> run, and it saves the result so it won't re-compute it for a given > > >>>>> function). The clang armv7 codegen often has mid-function epilogues > > >>>>> (early returns) which definitely complicated things and made it > > >>>>> necessary to step through the entire function bodies. There's a > > >>>>> bunch of code I added to support these mid-function epilogues - I > > >>>>> have to save the register save state when I see an instruction which > > >>>>> looks like an epilogue, and when I see the final ret instruction (aka > > >>>>> restoring the saved lr contents into pc), I re-install the register > > >>>>> save state from before the epilogue started. > > >>>>> > > >>>>> These things always make me a little nervous because the instruction > > >>>>> analyzer obviously is doing a static analysis so it knows nothing > > >>>>> about flow control. Tong's patch stops when it sees the first CALL > > >>>>> instruction - but that's not right, that's just solving the problem > > >>>>> for his particular function which doesn't have any CALL instructions > > >>>>> before his prologue. :) You could imagine a function which saves a > > >>>>> couple of registers, calls another function, then saves a couple more > > >>>>> because it needs more scratch registers. > > >>>>> > > >>>>> If we're going to change to profiling deep into the function -- and > > >>>>> I'm not opposed to doing that, it's been fine on arm -- we should > > >>>>> just do the entire function I think. > > >>>>> > > >>>>> > > >>>>> Another alternative would be to trust eh_frame on x86_64 at frame 0. > > >>>>> This is one of those things where there's not a great solution. The > > >>>>> unwind instructions in eh_frame are only guaranteed to be accurate > > >>>>> for synchronous unwinds -- that is, they are only guaranteed to be > > >>>>> accurate at places where an exception could be thrown - at call > > >>>>> sites. So for instances, there's no reason why the compiler has to > > >>>>> describe the function prologue instructions at all. There's no > > >>>>> requirement that the eh_frame instructions describe the epilogue > > >>>>> instructions. The information about spilled registers only needs to > > >>>>> be emitted where we could throw an exception, or where a callee could > > >>>>> throw an exception. > > >>>>> > > >>>>> clang/gcc both emit detailed instructions for the prologue setup. > > >>>>> But for i386 codegen if the compiler needs to access some pc-relative > > >>>>> data, it will do a "call next-instruction; pop %eax" to get the > > >>>>> current pc value. (x86_64 has rip-relative addressing so this isn't > > >>>>> needed) If you're debugging -fomit-frame-pointer code, that means > > >>>>> your CFA is expressed in terms of the stack pointer and the stack > > >>>>> pointer just changed mid-function --- and eh_frame instructions don't > > >>>>> describe this. > > >>>>> > > >>>>> The end result: If you want accurate unwinds 100% of the time, you > > >>>>> can't rely on the unwind instructions from eh_frame. But they'll get > > >>>>> you accurate unwinds 99.9% of the time ... also, last I checked, > > >>>>> neither clang nor gcc describe the epilogue instructions. > > >>>>> > > >>>>> > > >>>>> In *theory* the unwind instructions from the DWARF debug_frame > > >>>>> section should be asynchronous -- they should describe how to find > > >>>>> the CFA address for every instruction in the function. Which makes > > >>>>> sense - you want eh_frame to be compact because it's bundled into the > > >>>>> executable, so it should only have the information necessary for > > >>>>> exception handling and you can put the verbose stuff in debug_frame > > >>>>> DWARF for debuggers. But instead (again, last time I checked), the > > >>>>> compilers put the exact same thing in debug_frame even if you use the > > >>>>> -fasynchronous-unwind-tables (or whatever that switch was) option. > > >>>>> > > >>>>> > > >>>>> So I don't know, maybe we should just start trusting eh_frame at > > >>>>> frame 0 and write off those .1% cases where it isn't correct instead > > >>>>> of trying to get too fancy with the assembly analysis code. > > >>>>> > > >>>>> > > >>>>> > > >>>>>> On Jul 29, 2014, at 4:17 PM, Todd Fiala <tfi...@google.com> wrote: > > >>>>>> > > >>>>>> Hey Jason, > > >>>>>> > > >>>>>> Do you have any feedback on this? > > >>>>>> > > >>>>>> Thanks! > > >>>>>> > > >>>>>> -Todd > > >>>>>> > > >>>>>> > > >>>>>> On Fri, Jul 25, 2014 at 1:42 PM, Tong Shen <endlessr...@google.com> > > >>>>>> wrote: > > >>>>>> Sorry, wrong version of patch... > > >>>>>> > > >>>>>> > > >>>>>> On Fri, Jul 25, 2014 at 1:41 PM, Tong Shen <endlessr...@google.com> > > >>>>>> wrote: > > >>>>>> Hi Molenda, lldb-commits, > > >>>>>> > > >>>>>> For now, x86 assembly profiler will stop after 10 "non-prologue" > > >>>>>> instructions. In practice it may not be sufficient. For example, we > > >>>>>> have a hand-written assembly function, which have hundreds of > > >>>>>> instruction before actual (stack-adjusting) prologue instructions. > > >>>>>> > > >>>>>> One way is to change the limit to 1000; but there will always be > > >>>>>> functions that break the limit :-) I believe the right thing to do > > >>>>>> here is parsing all instructions before "ret"/"call" as prologue > > >>>>>> instructions. > > >>>>>> > > >>>>>> Here's what I changed: > > >>>>>> - For "push %rbx" and "mov %rbx, -8(%rbp)": only add first row for > > >>>>>> that register. They may appear multiple times in function body. But > > >>>>>> as long as one of them appears, first appearance should be in > > >>>>>> prologue(If it's not in prologue, this function will not use %rbx, > > >>>>>> so these 2 instructions should not appear at all). > > >>>>>> - Also monitor "add %rsp 0x20". > > >>>>>> - Remove non prologue instruction count. > > >>>>>> - Add "call" instruction detection, and stop parsing after it. > > >>>>>> > > >>>>>> Thanks. > > >>>>>> > > >>>>>> -- > > >>>>>> Best Regards, Tong Shen > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> -- > > >>>>>> Best Regards, Tong Shen > > >>>>>> > > >>>>>> _______________________________________________ > > >>>>>> lldb-commits mailing list > > >>>>>> lldb-commits@cs.uiuc.edu > > >>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/lldb-commits > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> -- > > >>>>>> Todd Fiala | Software Engineer | tfi...@google.com | > > >>>>>> 650-943-3180 > > >>>>>> > > >>>>> > > >>>>> > > >>>>> > > >>>>> > > >>>>> -- > > >>>>> Best Regards, Tong Shen > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> -- > > >>>> Best Regards, Tong Shen > > >>> > > >>> > > >>> > > >>> > > >>> -- > > >>> Best Regards, Tong Shen > > >> > > >> > > >> > > >> > > >> -- > > >> Best Regards, Tong Shen > > >> > > >> > > >> > > >> -- > > >> Best Regards, Tong Shen > > >> <adjust_cfi_for_frame_zero.patch> > > > > > > > > > > > > > -- > > Best Regards, Tong Shen > > > > > > > > -- > > Best Regards, Tong Shen > > > > > -- > Best Regards, Tong Shen > > _______________________________________________ > lldb-commits mailing list > lldb-commits@cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/lldb-commits > > > > > -- > Todd Fiala | Software Engineer | tfi...@google.com | 650-943-3180 > > > > > -- > Todd Fiala | Software Engineer | tfi...@google.com | 650-943-3180 > > > > > -- > Todd Fiala | Software Engineer | tfi...@google.com | 650-943-3180 > > > > > -- > Best Regards, Tong Shen > <augment_eh_frame.patch>_______________________________________________ > lldb-commits mailing list > lldb-commits@cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/lldb-commits _______________________________________________ lldb-commits mailing list lldb-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/lldb-commits