On 27/04/2007, at 00:14, Lucian Adrian Grijincu wrote:
in apr-conv10-faster.patch you added: static const char digits[] = "0123456789"; *--p = digits[magnitude % 10]; Why is this faster than: *--p = (char) '0' + (magnitude % 10); ?
You have to take into account the entire loop. The fowling:
do {
u_widest_int new_magnitude = magnitude / 10;
*--p = (char) (magnitude - new_magnitude * 10 + '0');
magnitude = new_magnitude;
} while (magnitude);
against:
do {
*--p = digits[magnitude % 10];
} while ((magnitude /= 10) != 0);
digits is easily cacheable, fewer assignments.
For your "faster" version, under the hood, the C compiler adds (magnitude % 10) to the address of digits and then copies the contents of the memory location represented by the sum's result into *--p.My version just adds (magnitude % 10) to '0' and stores the result in *--p.
Talk is cheap, let's benchmark! To see the generated assembly: gcc -O2 -o bench bench.c -g objdump -S bench > bench-asm # Intel(R) Celeron(R) CPU 2.20GHz [EMAIL PROTECTED] ~]$ gcc -o bench bench.c -O2 # uint32_t [EMAIL PROTECTED] ~]$ ./bench $RANDOM conv_1 cycles: 236 cycles: 236 cycles: 236 cycles: 236 cycles: 236 cycles: 236 cycles: 236 cycles: 236 conv_2 cycles: 236 cycles: 220 cycles: 224 cycles: 220 cycles: 224 cycles: 224 cycles: 224 cycles: 224 conv_1 cycles: 236 cycles: 236 cycles: 236 cycles: 236 cycles: 236 cycles: 236 cycles: 236 cycles: 236 conv_2 cycles: 220 cycles: 224 cycles: 224 cycles: 224 cycles: 224 cycles: 224 cycles: 224 cycles: 224 [EMAIL PROTECTED] ~]$ gcc -o bench bench.c -O2 # uint64_t [EMAIL PROTECTED] ~]$ ./bench $RANDOM |more conv_1 cycles: 508 cycles: 532 cycles: 540 cycles: 468 cycles: 468 cycles: 468 cycles: 468 conv_2 cycles: 1188 cycles: 824 cycles: 896 cycles: 828 cycles: 824 cycles: 824 cycles: 820 conv_1 cycles: 524 cycles: 492 cycles: 468 cycles: 504 cycles: 468 cycles: 504 cycles: 468 conv_2 cycles: 768 cycles: 836 cycles: 836 cycles: 820 cycles: 820 cycles: 820 cycles: 820
Am I missing something here?
Both code, after compiler optimizations, yield similar results but hurts uint64_t (apr_uint64_t) case quite a bit. "Faster" was a overstatement, I withdraw apr-conv10-faster.patch.
-- Davi Arnaut
bench.c
Description: Binary data
