On Sun, 29 May 2016, Marc Glisse wrote: > On Sat, 28 May 2016, Alexander Monakov wrote: > > > For unsigned A, B, 'A > -1 / B' is a nice predicate for checking whether > > 'A*B' > > overflows (or 'B && A > -1 / B' if B may be zero). Let's optimize it to an > > invocation of __builtin_mul_overflow to avoid the divide operation. > > I forgot to ask earlier: what does this give for modes / platforms where > umulv4 does not have a specific implementation? Is the generic implementation > worse than A>-1/B, in which case we may want to check optab_handler before > doing the transformation? Or is it always at least as good?
If umulv<mode>4 is unavailable (which today is everywhere except x86), gcc falls back as follows. First, it tries to see if doing a multiplication in a 2x wider type is possible (which it usually is, as gcc supports __int128_t on 64-bit platforms and 64-bit long long on 32-bit platforms), then it looks at high bits of the 2x wide product. This should boil down to doing a 'high multiply' instruction if original operands' type matches register size, and a normal multiply + masking high bits if the type is smaller than register. Second, if the above fails (e.g. with 64-bit operands on a 32-bit platform), then gcc emits a sequence that performs the multiplication by parts in a 2x narrower type. I think the first, more commonly taken, fallback path results in an always-good code. In the second case, the eliminated 64-bit divide is unlikely to have a direct hw support; e.g., on i386 it's a library call to __udivdi3. This makes the transformation a likely loss for code size, a likely win for performance. It could be better if GCC could CSE REALPART (IFN_MUL_OVERFLOW) with A*B on gimple. Thanks. Alexander