--Andy

> On Jun 29, 2017, at 2:41 PM, Josh Poimboeuf <jpoim...@redhat.com> wrote:
> 
>> On Thu, Jun 29, 2017 at 02:09:54PM -0700, Andy Lutomirski wrote:
>>> On Thu, Jun 29, 2017 at 12:05 PM, Josh Poimboeuf <jpoim...@redhat.com> 
>>> wrote:
>>>> On Thu, Jun 29, 2017 at 11:50:18AM -0700, Andy Lutomirski wrote:
>>>>> On Thu, Jun 29, 2017 at 10:53 AM, Josh Poimboeuf <jpoim...@redhat.com> 
>>>>> wrote:
>>>>> There's a bug here that will need a small change to the entry code.
>>>>> 
>>>>> Mike Galbraith reported:
>>>>> 
>>>>>  WARNING: can't dereference registers at ffffc900089d7e08 for ip 
>>>>> ffffffff81740bbb
>>>>> 
>>>>> After some looking I found that it's caused by the following code
>>>>> snippet in the 'interrupt' macro in entry_64.S:
>>>>> 
>>>>>        /*
>>>>>         * Save previous stack pointer, optionally switch to interrupt 
>>>>> stack.
>>>>>         * irq_count is used to check if a CPU is already on an interrupt 
>>>>> stack
>>>>>         * or not. While this is essentially redundant with preempt_count 
>>>>> it is
>>>>>         * a little cheaper to use a separate counter in the PDA (short of
>>>>>         * moving irq_enter into assembly, which would be too much work)
>>>>>         */
>>>>>        movq    %rsp, %rdi
>>>>>        incl    PER_CPU_VAR(irq_count)
>>>>>        cmovzq  PER_CPU_VAR(irq_stack_ptr), %rsp
>>>>>        UNWIND_HINT_REGS base=rdi
>>>>>        pushq   %rdi
>>>>>        UNWIND_HINT_REGS indirect=1
>>>>> 
>>>>> The problem is that it's changing the stack pointer *before* writing the
>>>>> previous stack pointer (push %rdi).  So when unwinding from an NMI which
>>>>> hit between the rsp write and the rdi push, the unwinder tries to access
>>>>> the regs on the previous stack (by reading rdi), but the previous stack
>>>>> pointer isn't there yet, so the access is considered out of bounds.
>>>> 
>>>> Ugh, that code.  Does this problem go away with this patch applied:
>>>> 
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/entry_ist&id=2231ec7e0bcc1a2bc94a17081511ab54cc6badd1
>>>> 
>>>> If so, want to update the patch for new kernels (shouldn't conflict
>>>> with anything except your unwind hints)?
>>> 
>>> I don't think that patch will fix it, because it still updates rsp
>>> *before* writing the old rsp on the new stack.  So there's still a
>>> window where the "previous stack" pointer is missing.
>> 
>> But it's in a register.  Is undwarf not able to grok that?
> 
> Sorry, I didn't explain it very well.  Undwarf can find the regs pointer
> in rdi, it just doesn't trust its value.
> 
> See the stack_info.next_sp field, which is set in in_irq_stack():
> 
>    /*
>     * The next stack pointer is the first thing pushed by the entry code
>     * after switching to the irq stack.
>     */
>    info->next_sp = (unsigned long *)*(end - 1);
> 
> It's a safety mechanism.  The unwinder needs the last word of the irq
> stack page to point to the previous stack.  That way it can double check
> that the stack pointer it calculates is within the bounds of either the
> current stack or the previous stack.
> 
> In the above code, the previous stack pointer (or next stack pointer,
> depending on your perspective) hasn't been set up before it switches
> stacks.  So the unwinder reads an uninitialized value into
> info->next_sp, and compares that with the regs pointer, and then stops
> the unwind because it thinks it went off into the weeds.
> 

That should be manageable, though, I think.  With my patch applied (and maybe 
even without it), the only exception to that rule is if regs->sp points just 
above the top of the IRQ stack and the next instruction is push reg.  In that 
case, the reg is exactly as trustworthy as the normal rule.*  Can you teach the 
unwinding code that this is okay?

* If an NMI hits right there, then it relies on unwinding out of the NMI 
correctly.  But the usual checks that the target stack is a valid stack should 
prevent us from going off into the weeds regardless.

> -- 
> Josh

Reply via email to