On Thu, 29 Nov 2018 10:00:48 -0800 Andy Lutomirski <l...@amacapital.net> wrote:
> > > > Of course, another option is to just say "we don't do the inline case, > > then", and only ever do a call to a stub that does a "jmp" > > instruction. > > That’s not a terrible idea. It was the implementation of my first proof of concept that kicked off this entire idea, where others (Peter and Josh) thought it was better to modify the calls themselves. It does improve things. Just a reminder of the benchmarks of enabling all tracepoints (which use indirect jumps) and running hackbench: No RETPOLINES: 1.4503 +- 0.0148 seconds time elapsed ( +- 1.02% ) baseline RETPOLINES: 1.5120 +- 0.0133 seconds time elapsed ( +- 0.88% ) Added direct calls for trace_events: 1.5239 +- 0.0139 seconds time elapsed ( +- 0.91% ) With static calls: 1.5282 +- 0.0135 seconds time elapsed ( +- 0.88% ) With static call trampolines: 1.48328 +- 0.00515 seconds time elapsed ( +- 0.35% ) Full static calls: 1.47364 +- 0.00706 seconds time elapsed ( +- 0.48% ) Adding Retpolines caused a 1.5120 / 1.4503 = 1.0425 ( 4.25% ) slowdown Trampolines made it into 1.48328 / 1.4503 = 1.0227 ( 2.27% ) slowdown The above is the stub with the jmp case. With full static calls 1.47364 / 1.4503 = 1.0160 ( 1.6% ) slowdown Modifying the calls themselves does have an improvement (and this is much greater of an improvement when I had debugging enabled). Perhaps it's not worth the effort, but again, we do have control of what uses this. It's not a total free-for-all. Full results here: http://lkml.kernel.org/r/20181126155405.72b4f...@gandalf.local.home Although since lore.kernel.org seems to be having issues: https://marc.info/?l=linux-kernel&m=154326714710686 -- Steve