On Fri, Sep 05, 2025 at 10:51:03AM +0200, Peter Zijlstra wrote: > On Thu, Sep 04, 2025 at 05:24:10PM -0700, Kees Cook wrote: > > +- The check-call instruction sequence must be treated a single unit: it > > + cannot be rearranged or split or optimized. The pattern is that > > + indirect calls, "call *$target", get converted into: > > + > > + mov $target_expression, %target ; only present if the expression was > > + ; not already %target register > > + load -$offset(%target), %tmp ; load the typeid hash at target > > + cmp $hash, %tmp ; compare expected typeid with loaded > > + je .Lcheck_passed ; jump to the indirect call > > + .Lkcfi_trap$N: ; label of trap insn > > + trap ; trap on failure, but arranged so > > + ; "permissive mode" falls through > > + .Lkcfi_call$N: ; label of call insn > > + call *%target ; actual indirect call > > + > > + This pattern of call immediately after trap provides for the > > + "permissive" checking mode automatically: the trap gets handled, > > + a warning emitted, and then execution continues after the trap to > > + the call. > > I know it is far too late to do anything here. But I've recently dug > through a bunch of optimization manual and the like and that Jcc is > about as bad as it gets :/ > > The old optimization manual states that forward jumps are assumed > not-taken; while backward jumps are assumed taken. > > The new wisdom is that any Jcc must be assumed not-taken; that is, the > fallthrough case has the best throughput.
I would expect the cmp to be the slowest part of this sequence, and I figured the both the trap and the call to be speculation barriers? I'm not sure, though. Is changing the sequence actually useful? > Here we have a forward branch which is assumed taken :-( The constraints we have are: - Linux x86 KCFI trap handler decodes the instructions from the trap backwards, but it uses exact offsets (-12 and -6). - Control flow following the trap must make the call (for warn-only mode) If we change this, we'd need to make the insn decoder smarter to likey look at the insn AFTER the trap ("is it a direct jump?") And then use this, which is ugly, but matches second constraint: cmp $hash %tmp jne .Ltrap .Lcall: call *%target jmp .Ldone .Ltrap: ud2 jmp .Lcall .Ldone: +4 bytes for x86_64 -- Kees Cook