Tamar Christina <tamar.christ...@arm.com> writes: > Hi All, > > We are looking to implement saturation support in the compiler. The aim is to > recognize both Scalar and Vector variant of typical saturating expressions. > > As an example: > > 1. Saturating addition: > char sat (char a, char b) > { > int tmp = a + b; > return tmp > 127 ? 127 : ((tmp < -128) ? -128 : tmp); > } > > 2. Saturating abs: > char sat (char a) > { > int tmp = abs (a); > return tmp > 127 ? 127 : ((tmp < -128) ? -128 : tmp); > } > > 3. Rounding shifts > char rndshift (char dc) > { > int round_const = 1 << (shift - 1); > return (dc + round_const) >> shift; > } > > etc. > > Of course the first issue is that C does not really have a single idiom for > expressing this. > > At the RTL level we have ss_truncate and us_truncate and float_truncate for > truncation. > > At the Tree level we have nothing for truncation (I believe) for scalars. For > Vector code there already seems to be VEC_PACK_SAT_EXPR but it looks like > nothing actually generates this at the moment. it's just an unused tree code. > > For rounding there doesn't seem to be any existing infrastructure. > > The proposal to handle these are as follow, keep in mind that all of these > also > exist in their scalar form, as such detecting them in the vectorizer would be > the wrong place. > > 1. Rounding: > a) Use match.pd to rewrite various rounding idioms to shifts. > b) Use backwards or forward prop to rewrite these to internal functions > where even if the target does not support these rounding instructions > they > have a chance to provide a more efficient implementation than what would > be generated normally. > > 2. Saturation: > a) Use match.pd to rewrite the various saturation expressions into min/max > operations which opens up the expressions to further optimizations. > b) Use backwards or forward prop to convert to internal functions if the > resulting min/max expression still meet the criteria for being a > saturating expression. This follows the algorithm as outlined in "The > Software Vectorization handbook" by Aart J.C. Bik. > > We could get the right instructions by using combine if we don't rewrite > the instructions to an internal function, however then during > Vectorization > we would overestimate the cost of performing the saturation. The > constants > will the also be loaded into registers and so becomes a lot more > difficult > to cleanup solely in the backend. > > The one thing I am wondering about is whether we would need an internal > function > for all operations supported, or if it should be modelled as an internal FN > which > just "marks" the operation as rounding/saturating. After all, the only > difference > between a normal and saturating expression in RTL is the xx_truncate RTL > surrounding > the expression. Doing so would also mean that all targets whom have > saturating > instructions would automatically benefit from this.
I might have misunderstood what you meant here, but the *_truncate RTL codes are true truncations: the operand has to be wider than the result. Using this representation for general arithmetic is a problem if you're operating at the maximum size that the target supports natively. E.g. representing a 64-bit saturating addition as: - extend to 128 bits - do a 128-bit addition - truncate to 64 bits is going to be hard to cost and code-generate on targets that don't support native 128-bit operations (or at least, don't support them cheaply). This might not be a problem when recognising C idioms, since the C source code has to be able do the wider operation before truncating the result, but it could be a problem if we provide built-in functions or if we want to introduce compiler-generated saturating operations. RTL already has per-operation saturation such as ss_plus/us_plus, ss_minus/us_minus, ss_neg/us_neg, ss_mult/us_mult, ss_div, ss_ashift/us_ashift and ss_abs. I think we should do the same in gimple, using internal functions like you say. Thanks, Richard