* Denys Vlasenko <[email protected]> wrote:
> On 06/15/2015 10:20 PM, Ingo Molnar wrote:
> >> Actually, ecx and r11 need to be loaded first. They are not so much
> >> "restored"
> >> as "prepared for SYSRET insn". Every cycle lost in loading these delays
> >> SYSRET.
> >> [...]
> >
> > So in the typical case they will still be cached, and so their max latency
> > should
> > be around 3 cycles.
>
> If syscall flushes caches (say, a large read), or sleeps
> and CPU schedules away, then pt_regs->ip,flags are evicted
> and need to be reloaded.
>
> > In fact because they are memory loads, they don't really have dependencies,
> > they should be available to SYSRET almost immediately,
>
> They depend on the memory data.
>
> > i.e. within a cycle - and
> > there's no reason to believe why these loads wouldn't pipeline properly and
> > parallelize with the many other things SYSRET has to do to organize a
> > return to
> > user-space, before it can actually use the target RIP and RFLAGS.
>
> This does not sound right.
>
> If it takes, say, 20 cycles to pull data from e.g. L3 cache to ECX,
> then SYSRET can't possibly complete sooner than in 20 cycles.
Yeah, that's true, but my point is: SYSRET has to do a lot of other things
(permission checks, loading the user mode state - most of which are unrelated
to
R11/RCX), which take dozens of cycles, and which are probably overlapped with
any
cache misses on arguments such as R11/RCX.
It's not impossible that reordering helps, for example if SYSRET has some
internal
dependencies that makes it parallelism worse than ideal - but I'd complicate
this
code only if it gives a measurable improvement for cache-cold syscall
performance.
Thanks,
Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/