On Thu, Nov 29, 2018 at 02:25:33PM -0800, Andy Lutomirski wrote: > On Thu, Nov 29, 2018 at 2:22 PM Peter Zijlstra <pet...@infradead.org> wrote: > > > > On Thu, Nov 29, 2018 at 04:14:46PM -0600, Josh Poimboeuf wrote: > > > On Thu, Nov 29, 2018 at 11:01:48PM +0100, Peter Zijlstra wrote: > > > > On Thu, Nov 29, 2018 at 11:10:50AM -0600, Josh Poimboeuf wrote: > > > > > On Thu, Nov 29, 2018 at 08:59:31AM -0800, Andy Lutomirski wrote: > > > > > > > > > > (like pointing IP at a stub that retpolines to the target by reading > > > > > > the function pointer, a la the unoptimizable version), then okay, I > > > > > > guess, with only a small amount of grumbling. > > > > > > > > > > I tried that in v2, but Peter pointed out it's racy: > > > > > > > > > > > > > > > https://lkml.kernel.org/r/20181126160217.gr2...@hirez.programming.kicks-ass.net > > > > > > > > Ah, but that is because it is a global shared trampoline. > > > > > > > > Each static_call has it's own trampoline; which currently reads > > > > something like: > > > > > > > > RETPOLINE_SAFE > > > > JMP *key > > > > > > > > which you then 'defuse' by writing an UD2 on. _However_, if you write > > > > that trampoline like: > > > > > > > > 1: RETPOLINE_SAFE > > > > JMP *key > > > > 2: CALL_NOSPEC *key > > > > RET > > > > > > > > and have the text_poke_bp() handler jump to 2 (a location you'll never > > > > reach when you enter at 1), it will in fact work I think. The trampoline > > > > is never modified and not shared between different static_call's. > > > > > > But after returning from the function to the trampoline, how does it > > > return from the trampoline to the call site? At that point there is no > > > return address on the stack. > > > > Oh, right, so that RET don't work. ARGH. Time to go sleep I suppose. > > I assume I'm missing something, but can't it just be JMP_NOSPEC *key? > The code would call the trampoline just like any other function and, > if the alignment is bad, we can skip patching it. And, if we want the > performance back, maybe some day we can find a clean way to patch > those misaligned callers, too.
Yeah, this is currently the leading contender, though I believe it will use a direct jump like the current out-of-line implementation. -- Josh