On Mon, Mar 15, 2021 at 7:10 PM Peter Zijlstra <pet...@infradead.org> wrote:
>
> On Mon, Mar 15, 2021 at 06:04:41PM +0100, Sedat Dilek wrote:
>
> > make V=1 -j4 LLVM=1 LLVM_IAS=1
>
> So for giggles I checked, neither GCC nor LLVM seem to emit prefix NOPs
> when building with -march=sandybridge, they always use MOPL.
>
> Furthermore, the kernel explicitly sets: -falign-jumps=1
> -falign-loops=1, which, when not specified, default to 16 or so.
>
> This means that your userspace is *littered* with NOPL, even when you
> build your entire distro from source with -march=sandybridge.
> (arch/gentoo FTW I suppose).
>

That reminds me of the Git repo of the wireguard maintainer.

"x86: enable additional cpu optimizations for gcc v9.1+"

You mean something like that ^^?

- Sedat -

[1] 
https://git.zx2c4.com/laptop-kernel/commit/?id=116badbe0a18bc36ba90acb8b80cff41f9ab0686

> (The only good new is that recent LLVM has a pass to use alternative
> instruction encoding in order to grow a basic block in size in order to
> minimize the amount of NOP it needs to emit at the end in order to
> satisfy the jump/loop alignment.)
>
> So if you *really* deeply care about NOP performance on your SNB, start
> by teaching LLVM about prefix NOPs and rebuild your complete userspace.
> At that point, you can do some trivial patches to the kernel to make it
> use -march=sandybridge and prefix NOPs too.
>
> Until that time, the vast majority of NOPs your CPU will execute will be
> NOPL.

Reply via email to