* Peter Zijlstra <[email protected]> wrote:

> On Fri, Nov 27, 2015 at 09:38:11AM +0100, Ingo Molnar wrote:
> > 
> > * Peter Zijlstra <[email protected]> wrote:
> > 
> > > On Thu, Nov 19, 2015 at 11:23:00AM +0100, Ingo Molnar wrote:
> > > > PEBS is an asynchronous hardware tracing mechanism, when batched PEBS 
> > > > is used it 
> > > > might not even result in any interruption of execution. The 'pt_regs' 
> > > > does not 
> > > > necessarily correspond to an interrupted, restartable context - we take 
> > > > the RIP 
> > > > from the PEBS machinery and also use LBR and disassembly to determine 
> > > > the previous 
> > > > instruction, before reporting it to user-space.
> > > 
> > > Note that modern PEBS hardware (hsw+) does the rollback in hardware.
> > > Prior to that we indeed to it manually using the LBR.
> > > 
> > > As to pt_regs, we construct a franken pt_regs based on the actual PEBS
> > > buffer overflow PMI and bits from the PEBS record (which also includes
> > > some register state). See
> > > arch/x86/kernel/cpu/perf_event_intel_ds.c:setup_pebs_sample_data().
> > > 
> > > We always copy the flags, ip, bp and sp from the PEBS record into the
> > > interrupt pt_regs.
> > > 
> > > And note that the PEBS record is constructed at instruction retirement,
> > > so it shows the state _after_ the instruction, with exception of the
> > > (hsw+) real_ip field.
> > > 
> > > So the unwinder will have to be taught that if the IP points at a stack
> > > altering instruction (call, push, etc.) it will have to 'undo' the
> > > effects on the actual stack (I appreciate this might be 'interesting'
> > > for things like: pop, ret, etc.).
> > 
> > So do we dump both the 'real' and the actual RIP, to not force tooling into 
> > having 
> > to decode instructions and such?
> 
> Nope, we only expose the corrected one.
> 
> > (Which is pretty hard and fragile and not always 
> > possible with instructions that destroy the original RIP, like JMP, etc.)
> 
> Not sure what you're getting at here. We don't need the uncorrected
> instruction.

Well, we need it for stack unwinding, as you point it out:

> But the problem here is that we rewind the instruction stream, but not
> the stack. And the stack unwinder is (obviously) interested in the stack
> state.

Unwinding the stack state would fix it as well - but an equivalent solution 
would 
be to pass along the original RIP would fix it as well: we'd have a 
self-consistent pair of RIP/RSP.

Especially since unwinding the RSP is probably hard:

> I'm not sure we want (or need) to go undo the specific instruction's
> stack effect in-kernel. If the !DWARF unwinders are similarly confused
> we might need to put it in kernel (expensive *groan*). If its only the
> DWARF muck then its something that can be done in userspace just
> fine, although we might need to copy slightly more of the stack than SP
> is pointing at, such that we can undo RET/POP etc. which would have data
> beyond the head of stack.
> 
> The easiest solution might be to figure out the biggest stack offset for
> any instruction and always capture that much over the head of stack.

so I think the problem here is that the RSP does not match up to the RIP. We 
can 
either pass along the original RIP+RSP, or the fixed up one - but what we do 
currently is that we pass along only half of it - which corrupts dwarf 
unwinding 
state that doesn't tolerate such errors.

Thanks,

        Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to