On Wed, 2017-04-26 at 17:27 +0200, Ulf Hermann wrote: > However, if you strip .eh_frame and .eh_frame_hdr from the exe (thus > triggering the fp unwinding on the first frame), you will see that it > skips sigusr2. At the same time it invents another frame 0x403f40 on > the main thread. Apparently pthread_join creates two stack frames. As > it correctly unwinds the rest, the latter seemed harmless to me.
I am a little concerned about testing against an exec where .eh_frame is forcibly removed since that is an allocated section you are messing up the binary (which shows because the symbol table doesn't match anymore). It seems nicer to do the checks instead with a hacked up libdwfl/frame_unwind.c that simply doesn't handle cfi and so always uses the frame pointer unwinder: diff --git a/libdwfl/frame_unwind.c b/libdwfl/frame_unwind.c index fb42c1a..6de64e5 100644 --- a/libdwfl/frame_unwind.c +++ b/libdwfl/frame_unwind.c @@ -539,6 +539,7 @@ new_unwound (Dwfl_Frame *state) static void handle_cfi (Dwfl_Frame *state, Dwarf_Addr pc, Dwarf_CFI *cfi, Dwarf_Addr bias) { + return; Dwarf_Frame *frame; if (INTUSE(dwarf_cfi_addrframe) (cfi, pc, &frame) != 0) { You are right that in that case we loose/skip over sigusr2 from raise and end up at stdarg directly if we remove the pc & 0x1 check. But... that really is because we deliberately skip it. Proper/simple link-register/frame unwinding should say: - newPc = newLr & (~0x1); + newPc = lr; The newPc is the current link register, not the new one. With that we get the backtrace as expected. But... I now realize why you needed something like that in the case of mixed CFI/no-framepointer/no-CFI/framepointer code. Like we have in our testcase. In that case there is no good way to determine whether or not there really were proper frame pointers and/or how the previous frame was unwound. Our testcase is somewhat mean by using some signal/no-return code which, which is hard to properly unwind without full frame pointers or full CFI. And with the simpler code that doesn't try to guess whether or not to skip a frame you do end up with an extra siguser2 and/or main frame. We could try to be clever and realize the link register and pc are the same and then use the newLR instead as newPC. That however might just mean that it is a recursive call to the same function. So maybe the proper "fix" for that is to make our testcase a little less strict and allow the occasional extra frame instead of trying to make the frame pointer unwinder "extra smart". Maybe something like the attached patch? Cheers, Mark
diff --git a/backends/aarch64_unwind.c b/backends/aarch64_unwind.c index cac4ebd..18aaf9a 100644 --- a/backends/aarch64_unwind.c +++ b/backends/aarch64_unwind.c @@ -61,26 +61,15 @@ EBLHOOK(unwind) (Ebl *ebl __attribute__ ((unused)), Dwarf_Addr pc __attribute__ Dwarf_Word newPc, newLr, newFp, newSp; - // The initial frame is special. We are expected to return lr directly in this case, and we'll - // come back to the same frame again in the next round. - if ((pc & 0x1) == 0) - { - newLr = lr; - newFp = fp; - newSp = sp; - } - else - { - if (!readfunc(fp + LR_OFFSET, &newLr, arg)) - newLr = 0; - - if (!readfunc(fp + FP_OFFSET, &newFp, arg)) - newFp = 0; - - newSp = fp + SP_OFFSET; - } - - newPc = newLr & (~0x1); + if (!readfunc(fp + LR_OFFSET, &newLr, arg)) + newLr = 0; + + if (!readfunc(fp + FP_OFFSET, &newFp, arg)) + newFp = 0; + + newSp = fp + SP_OFFSET; + + newPc = lr; if (!setfunc(-1, 1, &newPc, arg)) return false; @@ -91,6 +80,5 @@ EBLHOOK(unwind) (Ebl *ebl __attribute__ ((unused)), Dwarf_Addr pc __attribute__ // If the fp is invalid, we might still have a valid lr. // But if the fp is valid, then the stack should be moving in the right direction. - // Except, if this is the initial frame. Then the stack doesn't move. - return newPc != 0 && (fp == 0 || newSp > sp || (pc & 0x1) == 0); + return newPc != 0 && (fp == 0 || newSp > sp); } diff --git a/tests/backtrace-subr.sh b/tests/backtrace-subr.sh index a303e32..9731c43 100644 --- a/tests/backtrace-subr.sh +++ b/tests/backtrace-subr.sh @@ -59,7 +59,7 @@ check_backtracegen() # Ignore it here as it is a bug of OS, not a bug of elfutils. check_err() { - if [ $(egrep -v <$1 'dwfl_thread_getframes: (No DWARF information found|no matching address range|address out of range)$' \ + if [ $(egrep -v <$1 'dwfl_thread_getframes: (No DWARF information found|no matching address range|address out of range|Invalid register)$' \ | wc -c) \ -eq 0 ] then diff --git a/tests/backtrace.c b/tests/backtrace.c index 1ff6353..21abe8a 100644 --- a/tests/backtrace.c +++ b/tests/backtrace.c @@ -90,6 +90,10 @@ callback_verify (pid_t tid, unsigned frameno, Dwarf_Addr pc, return; } Dwfl_Module *mod; + /* See case 4. Special case to help out simple frame pointer unwinders. */ + static bool duplicate_sigusr2 = false; + if (duplicate_sigusr2) + frameno--; static bool reduce_frameno = false; if (reduce_frameno) frameno--; @@ -125,6 +129,14 @@ callback_verify (pid_t tid, unsigned frameno, Dwarf_Addr pc, } /* FALLTHRU */ case 4: + /* Some simple frame unwinders get this wrong and think sigusr2 + is calling itself again. Allow it and just pretend there is + an extra sigusr2 frame. */ + if (symname != NULL && strcmp (symname, "sigusr2") == 0) + { + duplicate_sigusr2 = true; + break; + } assert (symname != NULL && strcmp (symname, "stdarg") == 0); break; case 5: