On Sun, Nov 29, 2015 at 7:27 AM, barret rhoden <[email protected]> wrote:

> Is the final output actually faster?  Or does the compiler realize
> what's going on and emit the same code?  I originally had the
> cpu_to_le32() style helpers, but held off on using more of those once
> we started bringing in the 9ns stuff.
>
> If the compiler isn't being smart, then maybe we get rid of the open
> coded ones.  We'd need to see the asm.
>

I tried that, both generated code, and performance improvement, on 4.8.x
(compiler coming with 14.04 LTS).
GCC is not going to optimize that, because, if you think, while it could do
it, there are many ways to skin the same cat, open coded wise.



unsigned int foo(unsigned char *p)
{
    return ((unsigned int) p[0]) | ((unsigned int) p[1] << 8) |
((unsigned int) p[2] << 16) | ((unsigned int) p[3] << 24);
}

unsigned int fast_foo(unsigned char *p)
{
    return *(unsigned int *) p;
}

[gcc -O3 -S foo.c]

foo:
        movzbl  1(%rdi), %eax
        movzbl  2(%rdi), %edx
        sall    $8, %eax
        sall    $16, %edx
        orl     %edx, %eax
        movzbl  (%rdi), %edx
        orl     %edx, %eax
        movzbl  3(%rdi), %edx
        sall    $24, %edx
        orl     %edx, %eax
        ret

fast_foo:
        movl    (%rdi), %eax
        ret

-- 
You received this message because you are subscribed to the Google Groups 
"Akaros" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to