On Mon, Oct 5, 2015 at 9:48 PM, Dave Hansen <[email protected]> wrote:
>
> Although I was probably wrong about the source of the overhead, the
> point still remains that the prefaulting is eating cycles for no
> practical benefit.
Yeah, no, I'm not disagreeing with that part, I'm just more of a "at
this point in the rc series we are probably better off reverting".
Your ext4 patch may well fix the issue, and be the right thing to do
(_regardless_ of the revert, in fact - while it might make the revert
unnecessary, it might also be a good idea even if we do revert).
The subtlety of this just worries me, and the reason I'd still be
inclined to revert is simply "it's been that way a long time, the safe
thing is to go back and take this slow".
> With "-e cycles:pp":
>> │ sub $0x8,%rsp
>> 24.57 │ stac
>> 15.49 │ mov (%rcx),%sil
>> 29.06 │ clac
>> 2.24 │ test %eax,%eax
>> 8.77 │ mov %sil,-0x1(%rbp)
>> 2.22 │ ↓ jne 66
>> │ movslq %edx,%rdx
Ok, so it really is the stac/clac that is the bulk of the cost. Hmm.
You're right that the loop there will only be executed once for your
case, so moving the stac/clac outside probably doesn't help. It
*might* still make a difference just for microarchitectural reasons
(ie they may cause more trouble just because they are close to an
instruction that depends on them), but it's questionable.
It is a bit worrisome to see that those things are so expensive. Right
now almost all user accesses will cause *lots* of clac/stac stuff.
I originally asked Intel to do SMAP using a segment prefix, but that
was not to be..
Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/