Re: [RFC][PR 67336][PING^6] Verify pointers during stack unwind

H.J. Lu Thu, 03 Aug 2017 07:55:30 -0700

On Thu, Aug 3, 2017 at 7:17 AM, Yury Gribov <tetra2005.patc...@gmail.com> wrote:
> On Thu, Aug 3, 2017 at 2:01 PM, H.J. Lu <hjl.to...@gmail.com> wrote:
>> On Thu, Aug 3, 2017 at 5:52 AM, Yury Gribov <tetra2005.patc...@gmail.com> 
>> wrote:
>>> On Thu, Aug 3, 2017 at 1:22 PM, H.J. Lu <hjl.to...@gmail.com> wrote:
>>>> On Wed, Aug 2, 2017 at 9:29 PM, Yury Gribov <tetra2005.patc...@gmail.com> 
>>>> wrote:
>>>>> On 02.08.2017 23:04, H.J. Lu wrote:
>>>>>>
>>>>>> On Wed, Aug 2, 2017 at 1:56 PM, Yury Gribov <tetra2005.patc...@gmail.com>
>>>>>> wrote:
>>>>>>>
>>>>>>> On 02.08.2017 21:48, H.J. Lu wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Aug 2, 2017 at 1:39 PM, Yury Gribov
>>>>>>>> <tetra2005.patc...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 02.08.2017 20:02, Jeff Law wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 08/02/2017 12:47 PM, Jakub Jelinek wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Aug 02, 2017 at 12:38:13PM -0600, Jeff Law wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 07/17/2017 01:23 AM, Yuri Gribov wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I've rebased the previous patch to trunk per Andrew's suggestion.
>>>>>>>>>>>>> Original patch description/motivation/questions are in
>>>>>>>>>>>>> https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01869.html
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Is his stuff used for exception handling?  If so, doesn't that make
>>>>>>>>>>>> the
>>>>>>>>>>>> performance a significant concern (ie that msync call?)
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> For the performance issue, I wonder if it wouldn't be better to just
>>>>>>>>>>> compile the unwinder twice, basically include everything needed for
>>>>>>>>>>> the
>>>>>>>>>>> verification backtrace in a single *.c file with special defines, 
>>>>>>>>>>> and
>>>>>>>>>>> not export anything from that *.o file except the single entrypoint.
>>>>>>>>>>> That would also handle the ABI problems.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Yea.
>>>>>>>>>>
>>>>>>>>>> Given that the vast majority of the time the addresses are valid, I
>>>>>>>>>> wonder if we could take advantage of that property to keep the
>>>>>>>>>> overhead
>>>>>>>>>> lower in the most common cases.
>>>>>>>>>>
>>>>>>>>>>> For the concurrent calls of other threads unmapping stacks of 
>>>>>>>>>>> running
>>>>>>>>>>> threads (that is where most of the verification goes against), I'm
>>>>>>>>>>> afraid
>>>>>>>>>>> that is simply non-solveable, even if you don't cache anything, it
>>>>>>>>>>> will
>>>>>>>>>>> still be racy.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Absolutely -- I think solving this stuff 100% reliably without races
>>>>>>>>>> isn't possible.  I think the question is can we make this
>>>>>>>>>> significantly
>>>>>>>>>> better.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> All,
>>>>>>>>>
>>>>>>>>> First of all, thanks for the feedback.  This is indeed not a 100%
>>>>>>>>> robust
>>>>>>>>> solution, just something to allow tools like Asan to produce more sane
>>>>>>>>> backtraces on crash (currently when stack is corrupted they would 
>>>>>>>>> crash
>>>>>>>>> in
>>>>>>>>> the unwinder without printing at least partial backtrace).
>>>>>>>>>
>>>>>>>>
>>>>>>>> FWIW, glibc 2.26 is changed to abort as soon as possible when stack is
>>>>>>>> corrupted:
>>>>>>>>
>>>>>>>> https://sourceware.org/bugzilla/show_bug.cgi?id=21752
>>>>>>>> https://sourceware.org/bugzilla/show_bug.cgi?id=12189
>>>>>>>>
>>>>>>>> There is very little point to unwind stack when stack is corrupted.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I'd argue that situation with __stack_chk_fail is very special.  It's
>>>>>>> used
>>>>>>> in production code where you'd want to halt as soon as corruption is
>>>>>>> detected to prevent further damage so even printing message is an
>>>>>>> additional
>>>>>>> risk.  Debugging tools (e.g. sanitizers) are different and would prefer
>>>>>>> to
>>>>>>> print as many survived frames as possible.
>>>>>>>
>>>>>>
>>>>>> But stack unwinder in libgcc is primarily for production use, not for
>>>>>> debugging.
>>>>>
>>>>>
>>>>> I can see it used in different projects to print backtraces on error 
>>>>> (bind9,
>>>>> SimGrid, sanitizers).  backtrace(3) also sits on top of libgcc unwinder 
>>>>> and
>>>>> seems to be meant primarily for debugging (at least many projects use it 
>>>>> for
>>>>> this purpose:
>>>>> https://codesearch.debian.net/search?q=backtrace_symbols&perpkg=1). Same 
>>>>> for
>>>>> libbacktrace which, as its README mentions, is meant to "print a detailed
>>>>> backtrace when an error occurs or to gather detailed profiling 
>>>>> information".
>>>>
>>>> But it shouldn't be performed on a corrupted stack.  When a stack is
>>>> corrupted, there is just no way to reliably unwind it.
>>>
>>> Why not? Attached patch shows that it's possible (up to a point of
>>> corruption of course).
>>>
>>>> It isn't a problem
>>>> for a debugger which is designed to analyze corrupted program.
>>>
>>> Yes but forcing all applications that would like to print helpful
>>> message on stack corruption to employ gdb sounds less efficient that
>>> providing fail-safe APIs in standard library...
>>                 ^^^^^^^^^^^  There is no such a thing of fail-safe on 
>> corrupted
>> stack.   A bad, but valid address, can be put on corrupted stack.  When
>> you unwind it, the unwinder may not crash, but give you bogus stack
>> backtrace.
>
> Agreed but at least my users were fine with this.  Having at least
> partial stack is very helpful when you app fails early during OS
> startup with no chance to attach debugger.
>


Unwinder in libgcc is used for C++ exception support, which assumes
that the unwind information is correct, including stack.  Should we
slowdown C++ exception to check fir corrupted unwind info, which
can't be done 100%?

-- 
H.J.

Re: [RFC][PR 67336][PING^6] Verify pointers during stack unwind

Reply via email to