On Thu, Aug 3, 2017 at 5:52 AM, Yury Gribov <tetra2005.patc...@gmail.com> wrote: > On Thu, Aug 3, 2017 at 1:22 PM, H.J. Lu <hjl.to...@gmail.com> wrote: >> On Wed, Aug 2, 2017 at 9:29 PM, Yury Gribov <tetra2005.patc...@gmail.com> >> wrote: >>> On 02.08.2017 23:04, H.J. Lu wrote: >>>> >>>> On Wed, Aug 2, 2017 at 1:56 PM, Yury Gribov <tetra2005.patc...@gmail.com> >>>> wrote: >>>>> >>>>> On 02.08.2017 21:48, H.J. Lu wrote: >>>>>> >>>>>> >>>>>> On Wed, Aug 2, 2017 at 1:39 PM, Yury Gribov >>>>>> <tetra2005.patc...@gmail.com> >>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> On 02.08.2017 20:02, Jeff Law wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 08/02/2017 12:47 PM, Jakub Jelinek wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Aug 02, 2017 at 12:38:13PM -0600, Jeff Law wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 07/17/2017 01:23 AM, Yuri Gribov wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I've rebased the previous patch to trunk per Andrew's suggestion. >>>>>>>>>>> Original patch description/motivation/questions are in >>>>>>>>>>> https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01869.html >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Is his stuff used for exception handling? If so, doesn't that make >>>>>>>>>> the >>>>>>>>>> performance a significant concern (ie that msync call?) >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> For the performance issue, I wonder if it wouldn't be better to just >>>>>>>>> compile the unwinder twice, basically include everything needed for >>>>>>>>> the >>>>>>>>> verification backtrace in a single *.c file with special defines, and >>>>>>>>> not export anything from that *.o file except the single entrypoint. >>>>>>>>> That would also handle the ABI problems. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Yea. >>>>>>>> >>>>>>>> Given that the vast majority of the time the addresses are valid, I >>>>>>>> wonder if we could take advantage of that property to keep the >>>>>>>> overhead >>>>>>>> lower in the most common cases. >>>>>>>> >>>>>>>>> For the concurrent calls of other threads unmapping stacks of running >>>>>>>>> threads (that is where most of the verification goes against), I'm >>>>>>>>> afraid >>>>>>>>> that is simply non-solveable, even if you don't cache anything, it >>>>>>>>> will >>>>>>>>> still be racy. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Absolutely -- I think solving this stuff 100% reliably without races >>>>>>>> isn't possible. I think the question is can we make this >>>>>>>> significantly >>>>>>>> better. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> All, >>>>>>> >>>>>>> First of all, thanks for the feedback. This is indeed not a 100% >>>>>>> robust >>>>>>> solution, just something to allow tools like Asan to produce more sane >>>>>>> backtraces on crash (currently when stack is corrupted they would crash >>>>>>> in >>>>>>> the unwinder without printing at least partial backtrace). >>>>>>> >>>>>> >>>>>> FWIW, glibc 2.26 is changed to abort as soon as possible when stack is >>>>>> corrupted: >>>>>> >>>>>> https://sourceware.org/bugzilla/show_bug.cgi?id=21752 >>>>>> https://sourceware.org/bugzilla/show_bug.cgi?id=12189 >>>>>> >>>>>> There is very little point to unwind stack when stack is corrupted. >>>>> >>>>> >>>>> >>>>> I'd argue that situation with __stack_chk_fail is very special. It's >>>>> used >>>>> in production code where you'd want to halt as soon as corruption is >>>>> detected to prevent further damage so even printing message is an >>>>> additional >>>>> risk. Debugging tools (e.g. sanitizers) are different and would prefer >>>>> to >>>>> print as many survived frames as possible. >>>>> >>>> >>>> But stack unwinder in libgcc is primarily for production use, not for >>>> debugging. >>> >>> >>> I can see it used in different projects to print backtraces on error (bind9, >>> SimGrid, sanitizers). backtrace(3) also sits on top of libgcc unwinder and >>> seems to be meant primarily for debugging (at least many projects use it for >>> this purpose: >>> https://codesearch.debian.net/search?q=backtrace_symbols&perpkg=1). Same for >>> libbacktrace which, as its README mentions, is meant to "print a detailed >>> backtrace when an error occurs or to gather detailed profiling information". >> >> But it shouldn't be performed on a corrupted stack. When a stack is >> corrupted, there is just no way to reliably unwind it. > > Why not? Attached patch shows that it's possible (up to a point of > corruption of course). > >> It isn't a problem >> for a debugger which is designed to analyze corrupted program. > > Yes but forcing all applications that would like to print helpful > message on stack corruption to employ gdb sounds less efficient that > providing fail-safe APIs in standard library... ^^^^^^^^^^^ There is no such a thing of fail-safe on corrupted stack. A bad, but valid address, can be put on corrupted stack. When you unwind it, the unwinder may not crash, but give you bogus stack backtrace.
-- H.J.