Andy Lutomirski <l...@amacapital.net> writes:

> I don't really care about the number of instructions.
Right, a couple of test/jz/jnz is negligible in the exception path, that's what
I also think.

>  But there are still all the nasty cases:
>
>  - Context switch during exception processing (both in the C handler
> and in the retint code).
>  - PMI during exception processing.
>  - Exception while perf is poking at LBR msrs.

Yes.
Wasn't that what Thomas's suggestion on the per-cpu variable was solving ?
Ie:
        DEFINE_PER_CPU(unsigned long, lbr_dump_state) = LBR_OOPS_DISABLED;
        ...

We would have a "LBR resource" variable to track who owns the LBR :
 - nobody : LBR_UNCLAIMED
 - the exception handler : LBR_EXCEPTION_DEBUG_USAGE
   - activated with a runtime variable or config
   - impossible to activate if perf has hold of it
 - the perf code : LBR_PERF_USAGE
   - activated through perf infrastructure
   - impossible to activated if exception handler has hold of it

Now this solves the perf/exception concurrency on the LBR registers. If there is
a rescheduling during the exception, or a PMI, can that have an impact ?
 - case 1: nobody is handling LBR
   => no impact, expception handlers won't touch LBR
 - case 2: perf is handling LBR
   => no imppact, exception handler won't touch LBR

 - case 3: exception handlers are handling LBR

   - case 3a: simple user exception
       -> exception entry
       -> is kernel exception == false => bypass LBR handling
       -> exception handling

   - case 3b: simple kernel exception
       -> exception entry
       -> test lbr_dump_state == EXCEPTION_OWNED => true => STOP LBR
       -> no reschedule, no PMI
       -> exception handling
       -> test lbr_dump_state == EXCEPTION_OWNED => true => START LBR

   - case 3c: kernel exception with PMI
       -> exception entry
       -> test lbr_dump_state == EXCEPTION_OWNED => true => STOP LBR
       -> PMI
          can't touch LBR, as lbr_dump_state == EXCEPTION_OWNED
       -> exception handling
       -> test lbr_dump_state == EXCEPTION_OWNED => true => START LBR

   - case 3d: kernel exception with a reschedule inside
       -> exception entry
       -> test lbr_dump_state == EXCEPTION_OWNED => true => STOP LBR
       -> exception handling
       -> context_switch()
          -> perf cannot touch LBR, nobody can
       -> test lbr_dump_state == EXCEPTION_OWNED => true => START LBR

I might be very wrong in the description as I'm not that sharp on x86, but is
there a flaw in the above cases ?

If not, a couple of tests and Thomas's per-cpu variable can solve the issue,
while keeping the exception handler code simple as Emmanual has proposed (given
the additionnal test inclusion - which will be designed to not pollute the LBR),
and having a small impact on perf to solve the resource acquire issue.

Cheers.

--
Robert
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to