On 2015-11-29 at 10:27 barret rhoden wrote:
> On 2015-11-29 at 6:59 'Davide Libenzi' via Akaros wrote:
> > On Sun, Nov 29, 2015 at 5:28 AM, barret rhoden
> > <[email protected]> wrote:
> > > http://commandcenter.blogspot.com/2012/04/byte-order-fallacy.html
> > > (at least I think we do).
> > >
> >
> > I dispute almost every single point he makes there, but I kept my
> > inlines the slow, fully open coded version ☺
> > First, is more code? Yes, but, in many CPUs (like, Intel, which is
> > what matter most) is much faster.
>
> Is the final output actually faster? Or does the compiler realize
> what's going on and emit the same code? I originally had the
> cpu_to_le32() style helpers, but held off on using more of those once
> we started bringing in the 9ns stuff.
>
> If the compiler isn't being smart, then maybe we get rid of the open
> coded ones. We'd need to see the asm.
Quick test:
uint64_t endian_ifdef(uint64_t *u64)
{
return le64_to_cpu(*u64);
}
uint64_t endian_open_coded(uint64_t *u64)
{
return l64get(u64);
}
ffffffffc20f1dc0 <endian_ifdef>:
ffffffffc20f1dc0: 55 push %rbp
ffffffffc20f1dc1: 48 8b 07 mov (%rdi),%rax
ffffffffc20f1dc4: 48 89 e5 mov %rsp,%rbp
ffffffffc20f1dc7: 5d pop %rbp
ffffffffc20f1dc8: c3 retq
ffffffffc20f1dd0 <endian_open_coded>:
ffffffffc20f1dd0: 8b 57 18 mov 0x18(%rdi),%edx
ffffffffc20f1dd3: 48 8b 47 08 mov 0x8(%rdi),%rax
ffffffffc20f1dd7: 55 push %rbp
ffffffffc20f1dd8: c1 e2 08 shl $0x8,%edx
ffffffffc20f1ddb: 0b 57 10 or 0x10(%rdi),%edx
ffffffffc20f1dde: 48 c1 e0 08 shl $0x8,%rax
ffffffffc20f1de2: 48 0b 07 or (%rdi),%rax
ffffffffc20f1de5: 48 89 e5 mov %rsp,%rbp
ffffffffc20f1de8: 5d pop %rbp
ffffffffc20f1de9: c1 e2 10 shl $0x10,%edx
ffffffffc20f1dec: 48 09 d0 or %rdx,%rax
ffffffffc20f1def: 48 8b 57 28 mov 0x28(%rdi),%rdx
ffffffffc20f1df3: 48 c1 e2 08 shl $0x8,%rdx
ffffffffc20f1df7: 48 89 d1 mov %rdx,%rcx
ffffffffc20f1dfa: 8b 57 38 mov 0x38(%rdi),%edx
ffffffffc20f1dfd: 48 0b 4f 20 or 0x20(%rdi),%rcx
ffffffffc20f1e01: c1 e2 08 shl $0x8,%edx
ffffffffc20f1e04: 0b 57 30 or 0x30(%rdi),%edx
ffffffffc20f1e07: c1 e2 10 shl $0x10,%edx
ffffffffc20f1e0a: 48 09 ca or %rcx,%rdx
ffffffffc20f1e0d: 48 c1 e2 20 shl $0x20,%rdx
ffffffffc20f1e11: 48 09 d0 or %rdx,%rax
ffffffffc20f1e14: c3 retq
That's pretty horrendous. So it looks like you're right on all
accounts.
Rob made a couple other points about people screwing up the
conversions, but I don't see how the open coded version competes (esp
considering how the open-coded version doesn't protect you against all
endian mistakes - you still need to call the helpers, for instance).
Anyway, given that, feel free to code yours up however you'd like. =)
Unless other people have strong opinions, I'd also be in favor of
removing all of the open-coded endian converters and using one set of
correct static inlines for all of this stuff (i.e. consolidating our
existing half-dozen set of converters into one good set).
Barret
--
You received this message because you are subscribed to the Google Groups
"Akaros" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
For more options, visit https://groups.google.com/d/optout.