Andy Lutomirski <l...@amacapital.net> writes: > I don't really care about the number of instructions. Right, a couple of test/jz/jnz is negligible in the exception path, that's what I also think.
> But there are still all the nasty cases: > > - Context switch during exception processing (both in the C handler > and in the retint code). > - PMI during exception processing. > - Exception while perf is poking at LBR msrs. Yes. Wasn't that what Thomas's suggestion on the per-cpu variable was solving ? Ie: DEFINE_PER_CPU(unsigned long, lbr_dump_state) = LBR_OOPS_DISABLED; ... We would have a "LBR resource" variable to track who owns the LBR : - nobody : LBR_UNCLAIMED - the exception handler : LBR_EXCEPTION_DEBUG_USAGE - activated with a runtime variable or config - impossible to activate if perf has hold of it - the perf code : LBR_PERF_USAGE - activated through perf infrastructure - impossible to activated if exception handler has hold of it Now this solves the perf/exception concurrency on the LBR registers. If there is a rescheduling during the exception, or a PMI, can that have an impact ? - case 1: nobody is handling LBR => no impact, expception handlers won't touch LBR - case 2: perf is handling LBR => no imppact, exception handler won't touch LBR - case 3: exception handlers are handling LBR - case 3a: simple user exception -> exception entry -> is kernel exception == false => bypass LBR handling -> exception handling - case 3b: simple kernel exception -> exception entry -> test lbr_dump_state == EXCEPTION_OWNED => true => STOP LBR -> no reschedule, no PMI -> exception handling -> test lbr_dump_state == EXCEPTION_OWNED => true => START LBR - case 3c: kernel exception with PMI -> exception entry -> test lbr_dump_state == EXCEPTION_OWNED => true => STOP LBR -> PMI can't touch LBR, as lbr_dump_state == EXCEPTION_OWNED -> exception handling -> test lbr_dump_state == EXCEPTION_OWNED => true => START LBR - case 3d: kernel exception with a reschedule inside -> exception entry -> test lbr_dump_state == EXCEPTION_OWNED => true => STOP LBR -> exception handling -> context_switch() -> perf cannot touch LBR, nobody can -> test lbr_dump_state == EXCEPTION_OWNED => true => START LBR I might be very wrong in the description as I'm not that sharp on x86, but is there a flaw in the above cases ? If not, a couple of tests and Thomas's per-cpu variable can solve the issue, while keeping the exception handler code simple as Emmanual has proposed (given the additionnal test inclusion - which will be designed to not pollute the LBR), and having a small impact on perf to solve the resource acquire issue. Cheers. -- Robert -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/