> Hi Ashok, thanks for working on this -- I know the unwinder code can be a > hard to modify, RegisterContextLLDB.cpp is a little complex in places. :/ For sure, Jason, thanks for the sophisticated unwinder.
> A recent change to ObjectFileMachO is that it also gets the function start > addresses from the eh_frame information if LC_FUNCTION_STARTS doesn't exist: Nice, I see how that's an advantage in spite of the performance hit. I'll certainly look at reworking ObjectFileELF to add the function symbols for stripped symbols from the eh_frame information. > Let me know what you think. Perhaps the best approach is to do both. Having my suggested new code path in the unwinder isn't fundamentally wrong or a performance concern. In contrast, it does unblock Linux core file support and a high-profile bug for a common use case. I think it also improves the applicability of the unwinder while looking for improvements in other object-file formats (i.e. ObjectFilePECOFF). If you like the idea, I'm happy to commit & improve, - Ashok On Jun 7, 2013, at 11:46 AM, "Thirumurthi, Ashok" <[email protected]> wrote: > Hi Jason, > >> Frame 2 did not get a valid CFA for this frame, stopping stack walk > So, the attached patch allows the unwinder to get past frame 2 using eh_frame > information that is dug up based on the pc rather than the start address of > the function (i.e. to handle the case where the function symbol is > unavailable). > > This fix is coupled with GetFullUnwindPlanForFrame rather than lowered to > UnwindTable and FuncUnwinders. Alternately, I could add or modify routines > like GetFuncUnwindersContainingAddress to avoid the requirement for a > SymbolContext. Similarly, I could add or modify routines like > GetUnwindPlanAtCallSite to allow the caller to specify a pc. > > The attached patch also slides m_current_pc in the case where a Symbol is > found at pc - 1. Note that the log while adding frame 2 indicates a bogus fp: > th1/fr2 supplying caller's register 6 from the stack, saved at CFA plus > offset > th1/fr3 fp = 0x00000000004006db > > The slide keeps me out of the weeds while adding frame 3 (see the attached > log). The combined result is a healthy stack: > > (lldb) bt > * thread #1: tid = 0x2987, 0x00007ffba7b23425 libc.so.6`raise + 53, stop > reason = signal SIGABRT > frame #0: 0x00007ffba7b23425 libc.so.6`raise + 53 > frame #1: 0x00007ffba7b26b8b libc.so.6`abort + 379 > frame #2: 0x00007ffba7b1c0ee libc.so.6 > frame #3: 0x00007ffba7b1c192 libc.so.6`__assert_fail + 66 > frame #4: 0x00000000004005c0 a.out`main(argc=1, argv=0x00007fff1ccfbd68) > + 112 at main.c:18 > frame #5: 0x00007ffba7b0e76d libc.so.6`__libc_start_main + 237 > frame #6: 0x0000000000400489 a.out`_start + 41 > > Perhaps it would be helpful to provide a slightly different entry for frame > #2 like: > frame #2: 0x00007ffba7b1c0ee libc.so.6`??? + offset > > For now, I set eSkipFrame which is documented as a frame state that indicates > that the unwinder found issues and is hoping to recover. Perhaps a new value > would better document the fact that the frame goes with a function with no > known symbol. > > I'll commit this patch by next Monday since this is an important use > case for lldb 3.3 (and I assume that WDC is all encompassing for a > bit), but do fire away with any feedback. Cheers, > > - Ashok > > > -----Original Message----- > From: [email protected] > [mailto:[email protected]] On Behalf Of Thirumurthi, Ashok > Sent: Tuesday, May 28, 2013 10:52 AM > To: [email protected] > Subject: Re: [lldb-dev] regarding [Bug 15671] New: backtrace truncated > after assertion failure in inferior > > FYI, gdb can identify the frame addresses for/relative to mystery frame 2 > while at the assert site: > > (gdb) f 2 > #2 0x00007ffff7a4a0ee in ?? () from /lib/x86_64-linux-gnu/libc.so.6 > > (gdb) info frame > Stack level 2, frame at 0x7fffffffdee0: > rip = 0x7ffff7a4a0ee; saved rip 0x7ffff7a4a192 called by frame at > 0x7fffffffdf10, caller of frame at 0x7fffffffde80 Arglist at 0x7fffffffde78, > args: > Locals at 0x7fffffffde78, Previous frame's sp is 0x7fffffffdee0 Saved > registers: > rbx at 0x7fffffffdec0, rbp at 0x7fffffffdec8, r12 at 0x7fffffffded0, > rip at 0x7fffffffded8 > > - Ashok > > -----Original Message----- > From: [email protected] > [mailto:[email protected]] On Behalf Of Thirumurthi, Ashok > Sent: Monday, May 27, 2013 5:09 PM > To: [email protected] > Subject: Re: [lldb-dev] regarding [Bug 15671] New: backtrace truncated > after assertion failure in inferior > > Hi Jason, > > So, this thread is still relevant and reproducible using > functionalities/inferior-asserting on platforms where libc.so is compiled > with -fomit-frame-pointer. > >>>> The only solution I can think of here is if abort()'s eh_frame does >>>> provide a saved location for rbp but lldb failed to read it correctly. >>>> Else, I have no idea how gdb managed to unwind out of this one. > > FYI, the routine RegisterContextLLDB::InitializeNoneZerothFrame calls > ReadGPRValue for active_row->GetCFARegister(), which allows m_cfa to be set > for frame 1 'abort'. When this routine runs for the mystery frame 2, > m_sym_ctx.GetAddressRange comes up empty handed (consistent with gdb's > backtrace), so addr_range.GetBaseAddress() is not valid. As a result, > m_current_offset is -1, and this routine returns before m_cfa is read, > resulting in an invalid frame. > > >> But in this particular backtrace we've got -fomit-frame-pointer frames using >> eh_frame, then one function that doesn't have any symbol name or eh_frame >> entry, and I honestly have no idea how gdb found its way out of that one. > > Even if the function for frame 2 doesn't have a symbol name, is it possible > that it has an eh_frame entry that we can use? > > >>>> The only reasonable approach here would be to assume that this frame used >>>> a frame pointer (rbp), grab the saved rbp value and try to find the >>>> caller's pc based on that -- but that failed. > > So, I see the code that executes to handle the case where a function ends > with a call instruction, which backs up the PC by one byte. However, > ResolveSymbolContextForAddress fails, and SymbolContext::GetAddressRange > comes up empty handed because the member function is 0, so addr_range is not > set by this code. > > Without a function symbol, is there a way to set m_current_offset so > that ReadGPRRegister can read the saved rbp for frame 2? Thanks, > > - Ashok > > > -----Original Message----- > From: [email protected] > [mailto:[email protected]] On Behalf Of Langmuir, Ben > Sent: Monday, April 08, 2013 10:12 AM > To: Luddy Harrison; Jason Molenda > Cc: [email protected] > Subject: Re: [lldb-dev] regarding [Bug 15671] New: backtrace truncated > after assertion failure in inferior > > I've updated bugzilla with the output of image show-unwind -n abort. I > couldn't attach the output of readelf -wf libc.so.6 (too big) - is there a > way to only show info about the abort function? The name 'abort' doesn't > appear in the output. > > Ben > > -----Original Message----- > From: Luddy Harrison [mailto:[email protected]] > Sent: Monday, April 08, 2013 6:18 AM > To: Jason Molenda > Cc: Langmuir, Ben; [email protected] > Subject: Re: [lldb-dev] regarding [Bug 15671] New: backtrace truncated > after assertion failure in inferior > > hi, just to clarify, I regularly write asm with no eh frames or fonction > bounds, no .cfi. gdb unwinds my leaf funtions fine. it is my impression > that gdb will in the absence of frame info assume that the topmost item on > the stack at a trap is a return pc (even though the trapped pc cannot be > identified and has invalid rbp, so disasm of the leaf itself is not possible > > put differently if one can't figure out the leaf one can grope for the return > pc on the stack and try again at the caller. if the teturn pc points just > after a plausible-looking call insn then you're good. hope that makes > sense... > > Sent from my iPhone > > On 8 Apr, 2013, at 17:43, Jason Molenda <[email protected]> wrote: > >> Yeah, lldb uses similar tricks. If you have eh_frame instructions, >> unwinding from -fomit-frame-pointer code is easy. And if you have accurate >> function bounds for all the frames, lldb can usually manage to unwind an >> -fomit-frame-pointer stack without eh_frame (because it inspects the actual >> assembly instructions in the prologue to understand the stack setup). But >> in this particular backtrace we've got -fomit-frame-pointer frames using >> eh_frame, then one function that doesn't have any symbol name or eh_frame >> entry, and I honestly have no idea how gdb found its way out of that one. >> The only reasonable approach here would be to assume that this frame used a >> frame pointer (rbp), grab the saved rbp value and try to find the caller's >> pc based on that -- but that failed. >> >> Well, maybe the additional information from Ben (the eh_frame instructions >> for abort() most importantly) will provide a hint. The only thing I can >> think is that maybe lldb misinterpreted that function's eh_frame >> instructions. >> >> J >> >> >> On Apr 8, 2013, at 1:20 AM, Luddy Harrison wrote: >> >>> having done lots of asm debugging with gdb, I can offer a guess. gdb seems >>> to able to unwind frameless leaf functions with no unwind info. so >>> perhaps as a final fallback it pops the top entry on the stack and treats >>> it as the return pc. if it can unwind the caller using that pc, the it is >>> good. >>> >>> just a guess... >>> >>> -Luddy >>> >>> Sent from my iPhone >>> >>> On 8 Apr, 2013, at 6:01, Jason Molenda <[email protected]> wrote: >>> >>>> I see what's going on here. >>>> >>>> /lib/x86_64-linux-gnu/libc.so.6 was built -fomit-frame-pointer, and >>>> it includes eh_frame instructions on how to unwind the frames. But >>>> when lldb gets to >>>> >>>> #2 0x00007ffff7a4a0ee in ?? () from >>>> /lib/x86_64-linux-gnu/libc.so.6 >>>> >>>> it doesn't have any eh_frame instructions. lldb can figure out the stack >>>> pointer value (from frame 1) which tells us the "bottom" of this stack >>>> frame but it can't find the "top" without eh_frame unwind instructions or >>>> knowing what function it is in so it can do an assembly instruction scan >>>> to understand how the stack frame was set up. lldb tries to get a saved >>>> frame pointer (rbp) which would give us the "top" of the stack frame but >>>> the saved rbp value it gets (0x40067e0) is obviously invalid. >>>> >>>> It might be interesting to see the output of >>>> >>>> image show-unwind -n abort >>>> >>>> to see exactly what the eh_frame instructions read (this is lldb's >>>> interpretation of the eh_frame instructions, of course, it might be >>>> useful to include the output of readelf -wf libc.so.6 or readelf >>>> -wF >>>> libc.so.6 for the abort() function, going by a web page for readelf >>>> I found on the web.) The log output included this, >>>> >>>> th1/fr0 supplying caller's saved reg 16's location, cached >>>> th1/fr1 requested caller's saved PC but this UnwindPlan uses a RA >>>> reg; getting reg 16 instead >>>> th1/fr1 supplying caller's saved reg 16's location using eh_frame >>>> CFI UnwindPlan >>>> th1/fr1 supplying caller's register 16 from the stack, saved at CFA >>>> plus offset >>>> th1/fr2 pc = 0x00007f216e4850ee >>>> >>>> That bit about "this UnwindPlan uses a RA reg" is novel for x86 code, it's >>>> normally you see in arm code where the caller's saved pc value is in the >>>> link register on a function call. But as you'd guess from the name >>>> abort(), this may have the caller's register context saved in an unusual >>>> way so this may be fine. >>>> >>>> I'm surprised gdb can unwind this successfully. >>>> >>>> As I alluded to above, lldb can profile the assembly language instructions >>>> of a function to understand the prologue setup (where registers are saved, >>>> how the stack is set up, etc.) -- but to do this, it needs to know the >>>> start address of the function. This "#2 0x00007ffff7a4a0ee in ?? ()" >>>> frame clearly doesn't have any symbolic information with its address range >>>> so lldb can't do its assembly scan. And it doesn't have eh_frame >>>> instructions to help either. >>>> >>>> On Mac OS X we're often working with binaries that have had most of their >>>> symbols stripped. Because it is so valuable to lldb to have accurate >>>> function ranges, we supplement the symbol table with two sources: The >>>> LC_FUNCTION_STARTS section, and barring that (this is new), the eh_frame >>>> section. LC_FUNCTION_STARTS is an array of LEB128 encoded offsets of all >>>> the start addresses of the functions in the file. The first function is >>>> at offset 0, etc. It's real compact, typically a few bytes per function. >>>> The eh_frame section is another great source of function bounds >>>> information but it tends to be larger and slower to parse through. lldb >>>> adds fake symbol names for these function ranges that it adds, e.g. a fake >>>> symbol added to the program Dock might be >>>> "__lldb_unnamed_function3491$$Dock". >>>> >>>> Of course, given that lldb couldn't find eh_frame instructions for "#2 >>>> 0x00007ffff7a4a0ee in ?? ()", maybe even that wouldn't have helped. >>>> >>>> >>>> The only solution I can think of here is if abort()'s eh_frame does >>>> provide a saved location for rbp but lldb failed to read it correctly. >>>> Else, I have no idea how gdb managed to unwind out of this one. >>>> >>>> >>>> On Apr 7, 2013, at 5:46 AM, Langmuir, Ben wrote: >>>> >>>>> Done. >>>>> >>>>> -----Original Message----- >>>>> From: Jason Molenda [mailto:[email protected]] >>>>> Sent: Sunday, April 07, 2013 5:50 AM >>>>> To: Langmuir, Ben >>>>> Subject: regarding [Bug 15671] New: backtrace truncated after >>>>> assertion failure in inferior >>>>> >>>>> I don't know if I have a bugzilla account on llvm.org (I should >>>>> but I don't know what password it might have) but I wanted to ask >>>>> you to do >>>>> >>>>> (lldb) log enable lldb unwind >>>>> (lldb) run >>>>> (lldb) bt >>>>> >>>>> >>>>> and attach that output to >>>>> http://llvm.org/bugs/show_bug.cgi?id=15671 >>>>> >>>>> lldb should use a DefaultUnwindPlan for frame 2 ("?? ()" in gdb's >>>>> backtrace) to continue the unwind. I don't have linux installed on any >>>>> devices so I haven't looked but the output will probably be a good clue >>>>> as to why the unwind stopped early. >>>>> >>>>> >>>>> >>>>> J >>>> >>>> >>>> _______________________________________________ >>>> lldb-dev mailing list >>>> [email protected] >>>> http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev >> > > _______________________________________________ > lldb-dev mailing list > [email protected] > http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev > > _______________________________________________ > lldb-dev mailing list > [email protected] > http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev > > _______________________________________________ > lldb-dev mailing list > [email protected] > http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev > <pr15671.patch><unwind-full.txt> _______________________________________________ lldb-dev mailing list [email protected] http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev
