On Mon, Feb 5, 2018 at 2:22 PM, Linus Torvalds
<[email protected]> wrote:
>
> But I'm not timing it.
I lied.
I did this:
for (i = 0; i < 100000; i++)
asm(".rept 16384\n"
"subq $128,%rsp\n\t"
"pushq %rbx\n\t"
"pushq %r10\n\t"
"pushq %r11\n\t"
"pushq %r12\n\t"
"pushq %r13\n\t"
"pushq %r14\n\t"
"pushq %r15\n\t"
"popq %r15\n\t"
"popq %r14\n\t"
"popq %r13\n\t"
"popq %r12\n\t"
"popq %r11\n\t"
"popq %r10\n\t"
"popq %rbx\n\t"
"addq $128,%rsp\n\t"
".endr");
and then I timed it like that, and with "xorq" of the register after
each "pushq".
And the timings came out the same, to within the (bad) timing I did.
So I really do think you can just put the xor right next to the push,
and it will be effectively free.
Linus