Tamar Christina <tamar.christ...@arm.com> writes:
> Hi All,
>
> We are looking to implement saturation support in the compiler.  The aim is to
> recognize both Scalar and Vector variant of typical saturating expressions.
>
> As an example:
>
> 1. Saturating addition:
>    char sat (char a, char b)
>    {
>       int tmp = a + b;
>       return tmp > 127 ? 127 : ((tmp < -128) ? -128 : tmp);
>    }
>
> 2. Saturating abs:
>    char sat (char a)
>    {
>       int tmp = abs (a);
>       return tmp > 127 ? 127 : ((tmp < -128) ? -128 : tmp);
>    }
>
> 3. Rounding shifts
>    char rndshift (char dc)
>    {
>       int round_const = 1 << (shift - 1);
>       return (dc + round_const) >> shift;
>    }
>
> etc.
>
> Of course the first issue is that C does not really have a single idiom for
> expressing this.
>
> At the RTL level we have ss_truncate and us_truncate and float_truncate for
> truncation.
>
> At the Tree level we have nothing for truncation (I believe) for scalars. For
> Vector code there already seems to be VEC_PACK_SAT_EXPR but it looks like
> nothing actually generates this at the moment. it's just an unused tree code.
>
> For rounding there doesn't seem to be any existing infrastructure.
>
> The proposal to handle these are as follow, keep in mind that all of these 
> also
> exist in their scalar form, as such detecting them in the vectorizer would be
> the wrong place.
>
> 1. Rounding:
>    a) Use match.pd to rewrite various rounding idioms to shifts.
>    b) Use backwards or forward prop to rewrite these to internal functions
>       where even if the target does not support these rounding instructions 
> they
>       have a chance to provide a more efficient implementation than what would
>       be generated normally.
>
> 2. Saturation:
>    a) Use match.pd to rewrite the various saturation expressions into min/max
>       operations which opens up the expressions to further optimizations.
>    b) Use backwards or forward prop to convert to internal functions if the
>       resulting min/max expression still meet the criteria for being a
>       saturating expression.  This follows the algorithm as outlined in "The
>       Software Vectorization handbook" by Aart J.C. Bik.
>
>       We could get the right instructions by using combine if we don't rewrite
>       the instructions to an internal function, however then during 
> Vectorization
>       we would overestimate the cost of performing the saturation.  The 
> constants
>       will the also be loaded into registers and so becomes a lot more 
> difficult
>       to cleanup solely in the backend.
>
> The one thing I am wondering about is whether we would need an internal 
> function
> for all operations supported, or if it should be modelled as an internal FN 
> which
> just "marks" the operation as rounding/saturating. After all, the only 
> difference
> between a normal and saturating expression in RTL is the xx_truncate RTL 
> surrounding
> the expression.  Doing so would also mean that all targets whom have 
> saturating
> instructions would automatically benefit from this.

I might have misunderstood what you meant here, but the *_truncate
RTL codes are true truncations: the operand has to be wider than the
result.  Using this representation for general arithmetic is a problem
if you're operating at the maximum size that the target supports natively.
E.g. representing a 64-bit saturating addition as:

  - extend to 128 bits
  - do a 128-bit addition
  - truncate to 64 bits

is going to be hard to cost and code-generate on targets that don't support
native 128-bit operations (or at least, don't support them cheaply).
This might not be a problem when recognising C idioms, since the C source
code has to be able do the wider operation before truncating the result,
but it could be a problem if we provide built-in functions or if we want
to introduce compiler-generated saturating operations.

RTL already has per-operation saturation such as ss_plus/us_plus,
ss_minus/us_minus, ss_neg/us_neg, ss_mult/us_mult, ss_div,
ss_ashift/us_ashift and ss_abs.  I think we should do the same
in gimple, using internal functions like you say.

Thanks,
Richard

Reply via email to