On Fri, Nov 15, 2013 at 03:48:58PM -0800, Ben Pfaff wrote:
> On Wed, Nov 13, 2013 at 01:32:42PM -0800, Jarno Rajahalme wrote:
> > Add raw_ctz64(), ctz64(), and popcount64() using builtins when
> > available.
> >
> > I'm not sure if the "UINT64_MAX == ~0UL" and "UINT64_MAX == ~0ULL"
> > work in all cases as I imagine they would.
>
> I think you could use ULONG_MAX and ULLONG_MAX, any reason not to?
>
> > Signed-off-by: Jarno Rajahalme <[email protected]>
I had another thought.
It is really convenient how rightmost_1bit() and zero_rightmost_1bit()
work with any width integer. It would be nice if we could just make
ctz() and raw_ctz() do the same.
I was able to make rightmost_1bit() and zero_rightmost_1bit() work
that way because when I did some tests with GCC, I found that GCC was
smart enough to compile the code with the cheapest instructions. That
is, if you pass in a uint32_t it didn't bother to do 64-bit
arithmetic.
I see that GCC isn't smart enough to do that with __builtin_ctzll(),
but you can fake it with a small amount of manual effort. That is,
the following C file:
#include <stdint.h>
static inline int ctz(unsigned long long int x)
{
if (__builtin_constant_p(x <= UINT32_MAX) && x <= UINT32_MAX) {
return __builtin_ctz(x);
} else {
return __builtin_ctzll(x);
}
}
int ctzl(unsigned long int x)
{
return ctz(x);
}
int ctzll(unsigned long long int x)
{
return ctz(x);
}
compiles to this on 32-bit:
00000000 <ctzl>:
0: 0f bc 44 24 04 bsf 0x4(%esp),%eax
5: c3 ret
6: 8d 76 00 lea 0x0(%esi),%esi
9: 8d bc 27 00 00 00 00 lea 0x0(%edi,%eiz,1),%edi
00000010 <ctzll>:
10: 83 ec 1c sub $0x1c,%esp
13: 8b 44 24 20 mov 0x20(%esp),%eax
17: 8b 54 24 24 mov 0x24(%esp),%edx
1b: 89 04 24 mov %eax,(%esp)
1e: 89 54 24 04 mov %edx,0x4(%esp)
22: e8 fc ff ff ff call 23 <ctzll+0x13>
23: R_386_PC32 __ctzdi2
27: 83 c4 1c add $0x1c,%esp
2a: c3 ret
and to this on 64-bit:
0000000000000000 <ctzl>:
0: 48 0f bc c7 bsf %rdi,%rax
4: c3 retq
5: 66 66 2e 0f 1f 84 00 data32 nopw %cs:0x0(%rax,%rax,1)
c: 00 00 00 00
0000000000000010 <ctzll>:
10: 48 0f bc c7 bsf %rdi,%rax
14: c3 retq
which is just about perfect.
What do you think?
Thanks,
Ben.
_______________________________________________
dev mailing list
[email protected]
http://openvswitch.org/mailman/listinfo/dev