Ok - holding off on checking this per communication with Tong. Will see a new patch later today on this.
On Fri, Aug 22, 2014 at 9:49 AM, Todd Fiala <tfi...@google.com> wrote: > Er I'll "get it" in... eek.. > > > On Fri, Aug 22, 2014 at 9:49 AM, Todd Fiala <tfi...@google.com> wrote: > >> I'm going to test this now. If it all looks good, I'll ge tit in. >> >> >> On Tue, Aug 19, 2014 at 5:01 PM, Tong Shen <endlessr...@google.com> >> wrote: >> >>> Thanks Jason! >>> I will finish this patch and let's see how it goes. >>> >>> P.S. I know a little about eh_frame stuff; I added CFI to the new >>> Android ahead-of-time Java compiler so AOT'ed code can properly unwind :-) >>> >>> >>> >>> On Tue, Aug 19, 2014 at 4:51 PM, Jason Molenda <jmole...@apple.com> >>> wrote: >>> >>>> The CIE sets the initial unwind state -- the CIE may describe the >>>> unwind state at the first instruction (as it always does with gcc, clang) >>>> but in theory it could describe the unwind state once the prologue had >>>> executed. >>>> >>>> The idea is that there is one CIE entry which describes a typical >>>> at-first-instruction unwind state and then many FDEs that describe the >>>> unwind instructions for specific functions - they all use that one CIE. >>>> >>>> Anyway, that's just an implementation detail of eh_frame. I honestly >>>> don't think we should worry about incomplete eh_frame - let's try living on >>>> them and see how it works in practice. >>>> >>>> It may be possible to categorize eh_frame to see how complete it is. >>>> Compiler-generated x86 prologues are very regular, it would be possible to >>>> look at the first few bytes of a function for some pushes or stack pointer >>>> changes and see if the eh_frame describes that. We know what the unwind >>>> state is on the first instruction of a function (it's determined by the >>>> ABI) -- does the eh_frame have the same instructions? Can we can through >>>> the function for an epilogue, and if we find one, does the eh_frame have >>>> unwind instructions there? >>>> >>>> But I don't want to have the perfect be the enemy of the good. IMO >>>> let's take the plunge and try, to use eh_frame and see how that goes. We >>>> can refine it later, or back it out again (it will be a very small change >>>> to RegisterContextLLDB) if necessary. >>>> >>>> >>>> > On Aug 19, 2014, at 4:41 PM, Tong Shen <endlessr...@google.com> >>>> wrote: >>>> > >>>> > And for no prologue case: >>>> > We can detect this easily (any CFI for start address?) and bail out, >>>> so we will fallback to assembly profiler. >>>> > >>>> > >>>> > On Tue, Aug 19, 2014 at 4:36 PM, Tong Shen <endlessr...@google.com> >>>> wrote: >>>> > Ahh sorry I've been working on something else this week and didn't >>>> get back to you in time. >>>> > And you've been very patient and informative. Thanks! >>>> > >>>> > I'm only suggesting it for x86 / x86_64. What I am doing here relies >>>> on: >>>> > - Compiler describes prologue; >>>> > - We can figure our all mid function CFA changes by inspecting >>>> instructions. >>>> > >>>> > For frame 0, the new progress for CFA locating will look like this: >>>> > - Find the nearest CFI available before current PC. >>>> > - If the CFI is for current PC, viola :-) If not, continue. >>>> > - Inspect all instructions in between, and make changes to CFA >>>> accordingly. This can solve the PC relative addressing case. >>>> > - For epilogue, detect if we are in middle of an epilogue. >>>> Considering that there are not many patterns and they are all simple, I >>>> think we can enumerate them and handle accordingly. >>>> > >>>> > From what I've seen so far, this actually can solve most of gcc/clang >>>> generated code. >>>> > For JIT'ed code or hand written assembly, if there's no asynchronous >>>> CFI we are screwed anyway, so trying this won't hurt either (except some >>>> extra running time).\ >>>> > >>>> > I hope I explain my thoughts clearly. >>>> > >>>> > Thank you. >>>> > >>>> > >>>> > >>>> > On Tue, Aug 19, 2014 at 4:22 PM, Jason Molenda <jmole...@apple.com> >>>> wrote: >>>> > Hi Tong, my message was a little rambling. Let's be specific. >>>> > >>>> > We are changing lldb to trust eh_frame instructions on the >>>> currently-executing aka 0th frame. >>>> > >>>> > In practice, gcc and clang eh_frame both describe the prologue, so >>>> this is OK. >>>> > >>>> > Old gcc and clang eh_frame do not describe the epilogue. So we need >>>> to add a pass for i386/x86_64 (at least) to augment the eh_frame-sourced >>>> unwind instructions. I don't know if it would be best to augment eh_frame >>>> UnwindPlans when we create them in DWARFCallFrameInfo or if it would be >>>> better to do it lazily when we are actually using the unwind instructions >>>> in RegisterContextLLDB (probably RegisterContextLLDB like you were doing). >>>> We should only do it once for a given function, of course. >>>> > >>>> > I think it would cleanest if the augmentation function lived in the >>>> UnwindAssembly class. But I haven't looked how easy it is to get an >>>> UnwindAssembly object where we need it. >>>> > >>>> > >>>> > Thanks for taking this on. It will be interesting to try living >>>> entirely off eh_frame and see how that works for all the >>>> architectures/environments lldb supports. >>>> > >>>> > I worry a little that we're depending on the generous eh_frame from >>>> clang/gcc and if we try to run on icc (Intel's compiler) or something like >>>> that, we may have no prologue instructions and stepping will work very >>>> poorly. But we'll cross that bridge when we get to it. >>>> > >>>> > >>>> > >>>> > > On Aug 15, 2014, at 8:07 PM, Jason Molenda <jmole...@apple.com> >>>> wrote: >>>> > > >>>> > > Hi Tong, sorry for the delay in replying. >>>> > > >>>> > > I have a couple thoughts about the patch. First, the change in >>>> RegisterContextLLDB::GetFullUnwindPlanForFrame() forces the use of eh_frame >>>> unwind instructions ("UnwindPlanAtCallSite" - which normally means the >>>> eh_frame unwind instructions) for the currently-executing aka zeroth >>>> frame. We've talked about this before, but it's worth noting that this >>>> patch includes that change. >>>> > > >>>> > > There's still the problem of detecting how *asynchronous* those >>>> eh_frame unwind instructions are. For instance, what do you get for an >>>> i386 program that does >>>> > > >>>> > > #include <stdio.h> >>>> > > int main() >>>> > > { >>>> > > puts ("HI"); >>>> > > } >>>> > > >>>> > > Most codegen will use a sequence like >>>> > > >>>> > > call LNextInstruction >>>> > > .LNextInstruction >>>> > > pop ebx >>>> > > >>>> > > this call & pop sequence is establishing the "pic base", it the >>>> program will then use that address to find the "HI" constant data. If you >>>> compile this -fomit-frame-pointer, so we have to use the stack pointer to >>>> find the CFA, do the eh_frame instructions describe this? >>>> > > >>>> > > It's a bit of an extreme example but it's one of those tricky cases >>>> where asynchronous ("accurate at every instruction") unwind instructions >>>> and synchronous ("accurate at places where we can throw an exception, or a >>>> callee can throw an exception") unwind instructions are different. >>>> > > >>>> > > >>>> > > I would use behaves_like_zeroth_frame instead of if (IsFrameZero()) >>>> because you can have a frame in the middle of the stack which was the >>>> zeroth frame when an asynchronous signal came in -- in which case, the >>>> "callee" stack frame will be sigtramp. >>>> > > >>>> > > >>>> > > You'd want to update the UnwindLogMsgVerbose() text, of course. >>>> > > >>>> > > >>>> > > What your DWARFCallFrameInfo::PatchUnwindPlanForX86() function is >>>> doing is assuming that the unwind plan fails to include an epilogue >>>> description, steps through all the instructions in the function looking for >>>> the epilogue. >>>> > > >>>> > > DWARFCallFrameInfo doesn't seem like the right place for this. >>>> There's an assumption that the instructions came from eh_frame and that >>>> they are incomplete. It seems like it would more naturally live in the >>>> UnwindAssembly plugin and it would have a name like >>>> AugmentIncompleteUnwindPlanWithEpilogue or something like that. >>>> > > >>>> > > What if the CFI already does describe the epilogue? I imagine >>>> we'll just end up with a doubling of UnwindPlan Rows that describe the >>>> epilogue instructions. >>>> > > >>>> > > What if we have a mid-function epilogue? I've never seen gcc/clang >>>> generate these for x86, but it's possible. It's a common code sequence on >>>> arm/arm64. You can see a messy bit of code in >>>> UnwindAssemblyInstEmulation::GetNonCallSiteUnwindPlanFromAssembly which >>>> handles these -- saving the UnwindPlan's unwind instructions when we see >>>> the beginning of an epilogue, and once the epilogue is complete, restoring >>>> the unwind instructions. >>>> > > >>>> > > >>>> > > I'm not opposed to the patch - but it does make the assumption that >>>> we're going to use eh_frame for the currently executing function and that >>>> the eh_frame instructions do not include a description of the epilogue. >>>> (and that there is only one epilogue in the function). Mostly I want to >>>> call all of those aspects out so we're clear what we're talking about >>>> here. Let's clean it up a bit, put it in and see how it goes. >>>> > > >>>> > > J >>>> > > >>>> > > >>>> > >> On Aug 14, 2014, at 6:31 PM, Tong Shen <endlessr...@google.com> >>>> wrote: >>>> > >> >>>> > >> Hi Jason, >>>> > >> >>>> > >> Turns out we still need CFI for frame 0 in certain situations... >>>> > >> >>>> > >> A possible approach is to disassemble machine code, and manually >>>> adjust CFI for frame 0. For example, if we see "pop ebp; => ret", we set >>>> cfa to [esp]; if we see "call next-insn; => pop %ebp", we set >>>> cfa_offset+=4. >>>> > >> >>>> > >> Patch attached, now it just implements adjustment for "pop ebp; >>>> ret". >>>> > >> >>>> > >> If you think this approach is OK, I will go ahead and add other >>>> tricks(i386 pc relative addressing, more styles of epilogue, etc). >>>> > >> >>>> > >> Thank you for your time! >>>> > >> >>>> > >> >>>> > >> On Thu, Jul 31, 2014 at 12:50 PM, Tong Shen < >>>> endlessr...@google.com> wrote: >>>> > >> I think gdb's rationale for using CFI for leaf function is: >>>> > >> - gcc always generate CFI for progolue, so at function entry, we >>>> know the correct CFA; >>>> > >> - any stack pointer altering operation after that(mid-function & >>>> epilogue), we can recognize and handle them. >>>> > >> So basically, it assumes 2, hacks its way through 3 & 4, and >>>> pretends we are at 5. >>>> > >> Number of hacks we need seems to be small in x86 world, so this >>>> tradition is still here. >>>> > >> >>>> > >> Here's what gdb does for epilogue: normally when you run 'n', it >>>> will run one instruction a time till the next line/different stack id. But >>>> when it sees "pop %rbp; ret", it won't step into these instructions. >>>> Instead it will execute past them directly. >>>> > >> I didn't experiment with x86 pc-relative addressing; but I guess >>>> it will also recognize and execute past this pattern directly. >>>> > >> >>>> > >> So for compiler generated functions, what we do now with assembly >>>> parser now can be done with CFI + those gdb hacks. >>>> > >> And for hand-written assembly, i think CFI is almost always >>>> precise at instruction level. In this case, utilizing CFI instead of >>>> assembly parser will be a big help. >>>> > >> >>>> > >> So maybe we can apply those hacks, and trust CFI only for x86 & >>>> x86_64 targets? >>>> > >> >>>> > >> >>>> > >> On Thu, Jul 31, 2014 at 12:02 AM, Jason Molenda < >>>> jmole...@apple.com> wrote: >>>> > >> I think we could think of five levels of eh_frame information: >>>> > >> >>>> > >> >>>> > >> 1 unwind instructions at exception throw locations & locations >>>> where a callee may throw an exception >>>> > >> >>>> > >> 2 unwind instructions that describe the prologue >>>> > >> >>>> > >> 3 unwind instructions that describe the epilogue at the end of the >>>> function >>>> > >> >>>> > >> 4 unwind instructions that describe mid-function epilogues (I see >>>> these on arm all the time, don't see them on x86 with compiler generated >>>> code - but we don't use eh_frame on arm at Apple, I'm just mentioning it >>>> for completeness) >>>> > >> >>>> > >> 5 unwind instructions that describe any changes mid-function >>>> needed to unwind at all instructions ("asynchronous unwind information") >>>> > >> >>>> > >> >>>> > >> The eh_frame section only guarantees #1. gcc and clang always do >>>> #1 and #2. Modern gcc's do #3. I don't know if gcc would do #4 on arm but >>>> it's not important, I just mention it for completeness. And no one does #5 >>>> (as far as I know), even in the DWARF debug_frame section. >>>> > >> >>>> > >> I think it maybe possible to detect if an eh_frame entry fulfills >>>> #3 by looking if the CFA definition on the last row is the same as the >>>> initial CFA definition. But I'm not sure how a debugger could use >>>> heuristics to determine much else. >>>> > >> >>>> > >> >>>> > >> In fact, detecting #3 may be the easiest thing to detect. I'm not >>>> sure if the debugger could really detect #2 except maybe if the function >>>> had a standard prologue (push rbp, mov rsp rbp) and the eh_frame didn't >>>> describe the effects of these instructions, the debugger could know that >>>> the eh_frame does not describe the prologue. >>>> > >> >>>> > >> >>>> > >> >>>> > >> >>>> > >>> On Jul 30, 2014, at 6:58 PM, Tong Shen <endlessr...@google.com> >>>> wrote: >>>> > >>> >>>> > >>> Ah I understand now. >>>> > >>> >>>> > >>> Now prologue seems always included in CFI fro gcc & clang; and >>>> newer gcc includes epilogue as well. >>>> > >>> Maybe we can detect and use them when they are available? >>>> > >>> >>>> > >>> >>>> > >>> On Wed, Jul 30, 2014 at 6:44 PM, Jason Molenda < >>>> jmole...@apple.com> wrote: >>>> > >>> Ah, it looks like gcc changed since I last looked at its eh_frame >>>> output. >>>> > >>> >>>> > >>> It's not a bug -- the eh_frame unwind instructions only need to >>>> be accurate at instructions where an exception can be thrown, or where a >>>> callee function can throw an exception. There's no requirement to include >>>> prologue or epilogue instructions in the eh_frame. >>>> > >>> >>>> > >>> And unfortunately from lldb's perspective, when we see eh_frame >>>> we'll never know how descriptive it is. If it's old-gcc or clang, it won't >>>> include epilogue instructions. If it's from another compiler, it may not >>>> include any prologue/epilogue instructions at all. >>>> > >>> >>>> > >>> Maybe we could look over the UnwindPlan rows and see if the CFA >>>> definition of the last row matches the initial row's CFA definition. That >>>> would show that the epilogue is described. Unless it is a tail-call (aka >>>> noreturn) function - in which case the stack is never restored. >>>> > >>> >>>> > >>> >>>> > >>> >>>> > >>> >>>> > >>>> On Jul 30, 2014, at 6:32 PM, Tong Shen <endlessr...@google.com> >>>> wrote: >>>> > >>>> >>>> > >>>> GCC seems to generate a row for epilogue. >>>> > >>>> Do you think this is a clang bug, or at least a discrepancy >>>> between clang & gcc? >>>> > >>>> >>>> > >>>> Source: >>>> > >>>> int f() { >>>> > >>>> puts("HI\n"); >>>> > >>>> return 5; >>>> > >>>> } >>>> > >>>> >>>> > >>>> Compile option: only -g >>>> > >>>> >>>> > >>>> gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1) >>>> > >>>> clang version 3.5.0 (213114) >>>> > >>>> >>>> > >>>> Env: Ubuntu 14.04, x86_64 >>>> > >>>> >>>> > >>>> drawfdump -F of clang binary: >>>> > >>>> < 2><0x00400530:0x00400559><f><fde offset 0x00000088 length: >>>> 0x0000001c><eh aug data len 0x0> >>>> > >>>> 0x00400530: <off cfa=08(r7) > <off r16=-8(cfa) > >>>> > >>>> 0x00400531: <off cfa=16(r7) > <off r6=-16(cfa) > <off >>>> r16=-8(cfa) > >>>> > >>>> 0x00400534: <off cfa=16(r6) > <off r6=-16(cfa) > <off >>>> r16=-8(cfa) > >>>> > >>>> >>>> > >>>> drawfdump -F of gcc binary: >>>> > >>>> < 1><0x0040052d:0x00400542><f><fde offset 0x00000070 length: >>>> 0x0000001c><eh aug data len 0x0> >>>> > >>>> 0x0040052d: <off cfa=08(r7) > <off r16=-8(cfa) > >>>> > >>>> 0x0040052e: <off cfa=16(r7) > <off r6=-16(cfa) > <off >>>> r16=-8(cfa) > >>>> > >>>> 0x00400531: <off cfa=16(r6) > <off r6=-16(cfa) > <off >>>> r16=-8(cfa) > >>>> > >>>> 0x00400541: <off cfa=08(r7) > <off r6=-16(cfa) > <off >>>> r16=-8(cfa) > >>>> > >>>> >>>> > >>>> >>>> > >>>> On Wed, Jul 30, 2014 at 5:43 PM, Jason Molenda < >>>> jmole...@apple.com> wrote: >>>> > >>>> I'm open to trying to trust eh_frame at frame 0 for x86_64. The >>>> lack of epilogue descriptions in eh_frame is the biggest problem here. >>>> > >>>> >>>> > >>>> When you "step" or "next" in the debugger, the debugger >>>> instruction steps across the source line until it gets to the next source >>>> line. Every time it stops after an instruction step, it confirms that it >>>> is (1) between the start and end pc values for the source line, and (2) >>>> that the "stack id" (start address of the function + CFA address) is the >>>> same. If it stops and the stack id has changed, for a "next" command, it >>>> will backtrace one stack frame to see if it stepped into a function. If >>>> so, it sets a breakpoint on the return address and continues. >>>> > >>>> >>>> > >>>> If you switch lldb to prefer eh_frame instructions for x86_64, >>>> e.g. >>>> > >>>> >>>> > >>>> Index: source/Plugins/Process/Utility/RegisterContextLLDB.cpp >>>> > >>>> >>>> =================================================================== >>>> > >>>> --- source/Plugins/Process/Utility/RegisterContextLLDB.cpp >>>> (revision 214344) >>>> > >>>> +++ source/Plugins/Process/Utility/RegisterContextLLDB.cpp >>>> (working copy) >>>> > >>>> @@ -791,6 +791,22 @@ >>>> > >>>> } >>>> > >>>> } >>>> > >>>> >>>> > >>>> + // For x86_64 debugging, let's try using the eh_frame >>>> instructions even if this is the currently >>>> > >>>> + // executing function (frame zero). >>>> > >>>> + Target *target = exe_ctx.GetTargetPtr(); >>>> > >>>> + if (target >>>> > >>>> + && (target->GetArchitecture().GetCore() == >>>> ArchSpec::eCore_x86_64_x86_64h >>>> > >>>> + || target->GetArchitecture().GetCore() == >>>> ArchSpec::eCore_x86_64_x86_64)) >>>> > >>>> + { >>>> > >>>> + unwind_plan_sp = >>>> func_unwinders_sp->GetUnwindPlanAtCallSite >>>> (m_current_offset_backed_up_one); >>>> > >>>> + int valid_offset = -1; >>>> > >>>> + if (IsUnwindPlanValidForCurrentPC(unwind_plan_sp, >>>> valid_offset)) >>>> > >>>> + { >>>> > >>>> + UnwindLogMsgVerbose ("frame uses %s for full >>>> UnwindPlan, preferred over assembly profiling on x86_64", >>>> unwind_plan_sp->GetSourceName().GetCString()); >>>> > >>>> + return unwind_plan_sp; >>>> > >>>> + } >>>> > >>>> + } >>>> > >>>> + >>>> > >>>> // Typically the NonCallSite UnwindPlan is the unwind >>>> created by inspecting the assembly language instructions >>>> > >>>> if (behaves_like_zeroth_frame) >>>> > >>>> { >>>> > >>>> >>>> > >>>> >>>> > >>>> you'll find that you have to "next" twice to step out of a >>>> function. Why? With a simple function like: >>>> > >>>> >>>> > >>>> * thread #1: tid = 0xaf31e, 0x0000000100000eb9 a.out`foo + 25 at >>>> a.c:5, queue = 'com.apple.main-thread', stop reason = step over >>>> > >>>> #0: 0x0000000100000eb9 a.out`foo + 25 at a.c:5 >>>> > >>>> 2 int foo () >>>> > >>>> 3 { >>>> > >>>> 4 puts("HI"); >>>> > >>>> -> 5 return 5; >>>> > >>>> 6 } >>>> > >>>> 7 >>>> > >>>> 8 int bar () >>>> > >>>> (lldb) disass >>>> > >>>> a.out`foo at a.c:3: >>>> > >>>> 0x100000ea0: pushq %rbp >>>> > >>>> 0x100000ea1: movq %rsp, %rbp >>>> > >>>> 0x100000ea4: subq $0x10, %rsp >>>> > >>>> 0x100000ea8: leaq 0x6b(%rip), %rdi ; "HI" >>>> > >>>> 0x100000eaf: callq 0x100000efa ; symbol stub >>>> for: puts >>>> > >>>> 0x100000eb4: movl $0x5, %ecx >>>> > >>>> -> 0x100000eb9: movl %eax, -0x4(%rbp) >>>> > >>>> 0x100000ebc: movl %ecx, %eax >>>> > >>>> 0x100000ebe: addq $0x10, %rsp >>>> > >>>> 0x100000ec2: popq %rbp >>>> > >>>> 0x100000ec3: retq >>>> > >>>> >>>> > >>>> >>>> > >>>> if you do "next" lldb will instruction step, comparing the stack >>>> ID at every stop, until it gets to 0x100000ec3 at which point the stack ID >>>> will change. The CFA address (which the eh_frame tells us is rbp+16) just >>>> changed to the caller's CFA address because we're about to return. The >>>> eh_frame instructions really need to tell us that the CFA is now rsp+8 at >>>> 0x100000ec3. >>>> > >>>> >>>> > >>>> The end result is that you need to "next" twice to step out of a >>>> function. >>>> > >>>> >>>> > >>>> AssemblyParse_x86 has a special bit where it looks or the 'ret' >>>> instruction sequence at the end of the function - >>>> > >>>> >>>> > >>>> // Now look at the byte at the end of the AddressRange for a >>>> limited attempt at describing the >>>> > >>>> // epilogue. We're looking for the sequence >>>> > >>>> >>>> > >>>> // [ 0x5d ] mov %rbp, %rsp >>>> > >>>> // [ 0xc3 ] ret >>>> > >>>> // [ 0xe8 xx xx xx xx ] call __stack_chk_fail (this is >>>> sometimes the final insn in the function) >>>> > >>>> >>>> > >>>> // We want to add a Row describing how to unwind when we're >>>> stopped on the 'ret' instruction where the >>>> > >>>> // CFA is no longer defined in terms of rbp, but is now >>>> defined in terms of rsp like on function entry. >>>> > >>>> >>>> > >>>> >>>> > >>>> and adds an extra row of unwind details for that instruction. >>>> > >>>> >>>> > >>>> >>>> > >>>> I mention x86_64 as being a possible good test case here because >>>> I worry about the i386 picbase sequence (call next-instruction; pop $ebx) >>>> which occurs a lot. But for x86_64, my main concern is the epilogues. >>>> > >>>> >>>> > >>>> >>>> > >>>> >>>> > >>>>> On Jul 30, 2014, at 2:52 PM, Tong Shen <endlessr...@google.com> >>>> wrote: >>>> > >>>>> >>>> > >>>>> Thanks Jason! That's a very informative post, clarify things a >>>> lot :-) >>>> > >>>>> >>>> > >>>>> Well I have to admit that my patch is specifically for certain >>>> kind of functions, and now I see that's not the general case. >>>> > >>>>> >>>> > >>>>> I did some experiment with gdb. gdb uses CFI for frame 0, >>>> either x86 or x86_64. It looks for FDE of frame 0, and do CFA calculations >>>> according to that. >>>> > >>>>> >>>> > >>>>> - For compiler generated functions: I think there are 2 usage >>>> scenarios for frame 0: breakpoint and signal. >>>> > >>>>> - Breakpoints are usually at source line boundary instead of >>>> instruction boundary, and generally we won't be caught at stack pointer >>>> changing locations, so CFI is still valid. >>>> > >>>>> - For signal, synchronous unwind table may not be sufficient >>>> here. But only stack changing instructions will cause incorrect CFA >>>> calculation, so it' not always the case. >>>> > >>>>> - For hand written assembly functions: from what I've seen, >>>> most of the time CFI is present and actually asynchronous. >>>> > >>>>> So it seems that in most cases, even with only synchronous >>>> unwind table, CFI is still correct. >>>> > >>>>> >>>> > >>>>> I believe we can trust eh_frame for frame 0 and use assembly >>>> profiling as fallback. If both failed, maybe code owner should use >>>> -fasynchronous-unwind-tables :-) >>>> > >>>>> >>>> > >>>>> >>>> > >>>>> On Tue, Jul 29, 2014 at 4:59 PM, Jason Molenda < >>>> jmole...@apple.com> wrote: >>>> > >>>>> It was a tricky one and got lost in the shuffle of a busy >>>> week. I was always reluctant to try profiling all the instructions in a >>>> function. On x86, compiler generated code (gcc/clang anyway) is very >>>> simplistic about setting up the stack frame at the start and only having >>>> one epilogue - so anything fancier risked making mistakes and could >>>> possibly have a performance impact as we run functions through the >>>> disassembler. >>>> > >>>>> >>>> > >>>>> For hand-written assembly functions (which can be very creative >>>> with their prologue/epilogue and where it is placed), my position is that >>>> they should write eh_frame instructions in their assembly source to tell >>>> lldb where to find things. There is one or two libraries on Mac OS X where >>>> we break the "ignore eh_frame for the currently executing function" because >>>> there are many hand-written assembly functions in there and the eh_frame is >>>> going to beat our own analysis. >>>> > >>>>> >>>> > >>>>> >>>> > >>>>> After I wrote the x86 unwinder, Greg and Caroline implemented >>>> the arm unwinder where it emulates every instruction in the function >>>> looking for prologue/epilogue instructions. We haven't seen it having a >>>> particularly bad impact performance-wise (lldb only does this disassembly >>>> for functions that it finds on stacks during an execution run, and it saves >>>> the result so it won't re-compute it for a given function). The clang >>>> armv7 codegen often has mid-function epilogues (early returns) which >>>> definitely complicated things and made it necessary to step through the >>>> entire function bodies. There's a bunch of code I added to support these >>>> mid-function epilogues - I have to save the register save state when I see >>>> an instruction which looks like an epilogue, and when I see the final ret >>>> instruction (aka restoring the saved lr contents into pc), I re-install the >>>> register save state from before the epilogue started. >>>> > >>>>> >>>> > >>>>> These things always make me a little nervous because the >>>> instruction analyzer obviously is doing a static analysis so it knows >>>> nothing about flow control. Tong's patch stops when it sees the first CALL >>>> instruction - but that's not right, that's just solving the problem for his >>>> particular function which doesn't have any CALL instructions before his >>>> prologue. :) You could imagine a function which saves a couple of >>>> registers, calls another function, then saves a couple more because it >>>> needs more scratch registers. >>>> > >>>>> >>>> > >>>>> If we're going to change to profiling deep into the function -- >>>> and I'm not opposed to doing that, it's been fine on arm -- we should just >>>> do the entire function I think. >>>> > >>>>> >>>> > >>>>> >>>> > >>>>> Another alternative would be to trust eh_frame on x86_64 at >>>> frame 0. This is one of those things where there's not a great solution. >>>> The unwind instructions in eh_frame are only guaranteed to be accurate for >>>> synchronous unwinds -- that is, they are only guaranteed to be accurate at >>>> places where an exception could be thrown - at call sites. So for >>>> instances, there's no reason why the compiler has to describe the function >>>> prologue instructions at all. There's no requirement that the eh_frame >>>> instructions describe the epilogue instructions. The information about >>>> spilled registers only needs to be emitted where we could throw an >>>> exception, or where a callee could throw an exception. >>>> > >>>>> >>>> > >>>>> clang/gcc both emit detailed instructions for the prologue >>>> setup. But for i386 codegen if the compiler needs to access some >>>> pc-relative data, it will do a "call next-instruction; pop %eax" to get the >>>> current pc value. (x86_64 has rip-relative addressing so this isn't >>>> needed) If you're debugging -fomit-frame-pointer code, that means your CFA >>>> is expressed in terms of the stack pointer and the stack pointer just >>>> changed mid-function --- and eh_frame instructions don't describe this. >>>> > >>>>> >>>> > >>>>> The end result: If you want accurate unwinds 100% of the time, >>>> you can't rely on the unwind instructions from eh_frame. But they'll get >>>> you accurate unwinds 99.9% of the time ... also, last I checked, neither >>>> clang nor gcc describe the epilogue instructions. >>>> > >>>>> >>>> > >>>>> >>>> > >>>>> In *theory* the unwind instructions from the DWARF debug_frame >>>> section should be asynchronous -- they should describe how to find the CFA >>>> address for every instruction in the function. Which makes sense - you >>>> want eh_frame to be compact because it's bundled into the executable, so it >>>> should only have the information necessary for exception handling and you >>>> can put the verbose stuff in debug_frame DWARF for debuggers. But instead >>>> (again, last time I checked), the compilers put the exact same thing in >>>> debug_frame even if you use the -fasynchronous-unwind-tables (or whatever >>>> that switch was) option. >>>> > >>>>> >>>> > >>>>> >>>> > >>>>> So I don't know, maybe we should just start trusting eh_frame >>>> at frame 0 and write off those .1% cases where it isn't correct instead of >>>> trying to get too fancy with the assembly analysis code. >>>> > >>>>> >>>> > >>>>> >>>> > >>>>> >>>> > >>>>>> On Jul 29, 2014, at 4:17 PM, Todd Fiala <tfi...@google.com> >>>> wrote: >>>> > >>>>>> >>>> > >>>>>> Hey Jason, >>>> > >>>>>> >>>> > >>>>>> Do you have any feedback on this? >>>> > >>>>>> >>>> > >>>>>> Thanks! >>>> > >>>>>> >>>> > >>>>>> -Todd >>>> > >>>>>> >>>> > >>>>>> >>>> > >>>>>> On Fri, Jul 25, 2014 at 1:42 PM, Tong Shen < >>>> endlessr...@google.com> wrote: >>>> > >>>>>> Sorry, wrong version of patch... >>>> > >>>>>> >>>> > >>>>>> >>>> > >>>>>> On Fri, Jul 25, 2014 at 1:41 PM, Tong Shen < >>>> endlessr...@google.com> wrote: >>>> > >>>>>> Hi Molenda, lldb-commits, >>>> > >>>>>> >>>> > >>>>>> For now, x86 assembly profiler will stop after 10 >>>> "non-prologue" instructions. In practice it may not be sufficient. For >>>> example, we have a hand-written assembly function, which have hundreds of >>>> instruction before actual (stack-adjusting) prologue instructions. >>>> > >>>>>> >>>> > >>>>>> One way is to change the limit to 1000; but there will always >>>> be functions that break the limit :-) I believe the right thing to do here >>>> is parsing all instructions before "ret"/"call" as prologue instructions. >>>> > >>>>>> >>>> > >>>>>> Here's what I changed: >>>> > >>>>>> - For "push %rbx" and "mov %rbx, -8(%rbp)": only add first row >>>> for that register. They may appear multiple times in function body. But as >>>> long as one of them appears, first appearance should be in prologue(If it's >>>> not in prologue, this function will not use %rbx, so these 2 instructions >>>> should not appear at all). >>>> > >>>>>> - Also monitor "add %rsp 0x20". >>>> > >>>>>> - Remove non prologue instruction count. >>>> > >>>>>> - Add "call" instruction detection, and stop parsing after it. >>>> > >>>>>> >>>> > >>>>>> Thanks. >>>> > >>>>>> >>>> > >>>>>> -- >>>> > >>>>>> Best Regards, Tong Shen >>>> > >>>>>> >>>> > >>>>>> >>>> > >>>>>> >>>> > >>>>>> -- >>>> > >>>>>> Best Regards, Tong Shen >>>> > >>>>>> >>>> > >>>>>> _______________________________________________ >>>> > >>>>>> lldb-commits mailing list >>>> > >>>>>> lldb-commits@cs.uiuc.edu >>>> > >>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/lldb-commits >>>> > >>>>>> >>>> > >>>>>> >>>> > >>>>>> >>>> > >>>>>> >>>> > >>>>>> -- >>>> > >>>>>> Todd Fiala | Software Engineer | tfi...@google.com | >>>> 650-943-3180 >>>> > >>>>>> >>>> > >>>>> >>>> > >>>>> >>>> > >>>>> >>>> > >>>>> >>>> > >>>>> -- >>>> > >>>>> Best Regards, Tong Shen >>>> > >>>> >>>> > >>>> >>>> > >>>> >>>> > >>>> >>>> > >>>> -- >>>> > >>>> Best Regards, Tong Shen >>>> > >>> >>>> > >>> >>>> > >>> >>>> > >>> >>>> > >>> -- >>>> > >>> Best Regards, Tong Shen >>>> > >> >>>> > >> >>>> > >> >>>> > >> >>>> > >> -- >>>> > >> Best Regards, Tong Shen >>>> > >> >>>> > >> >>>> > >> >>>> > >> -- >>>> > >> Best Regards, Tong Shen >>>> > >> <adjust_cfi_for_frame_zero.patch> >>>> > > >>>> > >>>> > >>>> > >>>> > >>>> > -- >>>> > Best Regards, Tong Shen >>>> > >>>> > >>>> > >>>> > -- >>>> > Best Regards, Tong Shen >>>> >>>> >>> >>> >>> -- >>> Best Regards, Tong Shen >>> >>> _______________________________________________ >>> lldb-commits mailing list >>> lldb-commits@cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/lldb-commits >>> >>> >> >> >> -- >> Todd Fiala | Software Engineer | tfi...@google.com | 650-943-3180 >> > > > > -- > Todd Fiala | Software Engineer | tfi...@google.com | 650-943-3180 > -- Todd Fiala | Software Engineer | tfi...@google.com | 650-943-3180
_______________________________________________ lldb-commits mailing list lldb-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/lldb-commits