I'm trying to get the Multiprecision Arithmetic Builtins producing code as effective as longer integer types.
Firstly, I've defined some typedefs: typedef unsigned long long unsigned_word; typedef __uint128_t unsigned_128; And a result type, that carries two words: struct Result { unsigned_word lo; unsigned_word hi; }; Then I've defined two functions, both that should be functionally the same. They both take 4 words, the low and high bits of a 128bit word, and add them and return the result. Here's the first: Result f (unsigned_word lo1, unsigned_word hi1, unsigned_word lo2, unsigned_word hi2) { Result x; unsigned_128 n1 = lo1 + (static_cast<unsigned_128>(hi1) << 64); unsigned_128 n2 = lo2 + (static_cast<unsigned_128>(hi2) << 64); unsigned_128 r1 = n1 + n2; x.lo = r1 & ((static_cast<unsigned_128>(1) << 64) - 1); x.hi = r1 >> 64; return x; } Which inlines nicely at high optimisation level and produces the following very nice assembly on x86: movq 8(%rsp), %rsi movq (%rsp), %rbx addq 24(%rsp), %rsi adcq 16(%rsp), %rbx But then I've attempted to do the same thing with the multi-precision primitives: Result g (unsigned_word lo1, unsigned_word hi1, unsigned_word lo2, unsigned_word hi2) { Result x; unsigned_word carryout; x.lo = __builtin_addcll(lo1, lo2, 0, &carryout); x.hi = __builtin_addcll(hi1, hi2, carryout, &x.carry); return x; } The code above is simpler, but produces worse assembly movq 24(%rsp), %rsi movq (%rsp), %rbx addq 16(%rsp), %rbx // Line 1 addq 8(%rsp), %rsi adcq $0, %rbx // Line 2 Notice the additional adc of 0, where instead line 1 could be removed and line 2 replaced with: adcq 16(%rsp), %rbx This worse code for the mulitprecision builtins actually gets worse above 128bits, because instead of compiling to a chain of "adc"s there's a mix of "ors" etc to save and pass on the carries. So it seems that the "multiprecision builtins" are worse at multiprecision than complex bit-fiddling into larger times. However the complex bit-fiddling doesn't generalise well, so I'm wondering if someone can show me how to use these to produce an efficient "addc" chain (as I presume they are intended to).
_______________________________________________ cfe-users mailing list cfe-users@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-users