On Sun, 29 May 2016, Marc Glisse wrote:
> On Sat, 28 May 2016, Alexander Monakov wrote:
> 
> > For unsigned A, B, 'A > -1 / B' is a nice predicate for checking whether
> > 'A*B'
> > overflows (or 'B && A > -1 / B' if B may be zero).  Let's optimize it to an
> > invocation of __builtin_mul_overflow to avoid the divide operation.
> 
> I forgot to ask earlier: what does this give for modes / platforms where
> umulv4 does not have a specific implementation? Is the generic implementation
> worse than A>-1/B, in which case we may want to check optab_handler before
> doing the transformation? Or is it always at least as good?

If umulv<mode>4 is unavailable (which today is everywhere except x86), gcc
falls back as follows.  First, it tries to see if doing a multiplication in a
2x wider type is possible (which it usually is, as gcc supports __int128_t on
64-bit platforms and 64-bit long long on 32-bit platforms), then it looks at
high bits of the 2x wide product.  This should boil down to doing a 'high
multiply' instruction if original operands' type matches register size, and a
normal multiply + masking high bits if the type is smaller than register.

Second, if the above fails (e.g. with 64-bit operands on a 32-bit platform),
then gcc emits a sequence that performs the multiplication by parts in a 2x
narrower type.

I think the first, more commonly taken, fallback path results in an
always-good code. In the second case, the eliminated 64-bit divide is unlikely
to have a direct hw support; e.g., on i386 it's a library call to __udivdi3.
This makes the transformation a likely loss for code size, a likely win for
performance.  It could be better if GCC could CSE REALPART (IFN_MUL_OVERFLOW)
with A*B on gimple.

Thanks.
Alexander

Reply via email to