Hi Richard,

> "Roger Sayle" <ro...@nextmovesoftware.com> writes:
> > This patch addresses PR rtl-optimization/106594, a significant
> > performance regression affecting aarch64 recently introduced (exposed)
> > by one of my recent RTL simplification improvements.  Firstly many
> > thanks to Tamar Christina for confirming that the core of this patch
> > provides ~5% performance improvement on some on his benchmarks.
> >
> > GCC's combine pass uses the function expand_compound_operation to
> > conceptually simplify/canonicalize SIGN_EXTEND and ZERO_EXTEND as a
> > pair of shift operations, as not all targets support extension
> > instructions [technically ZERO_EXTEND may potentially be simplified/
> > canonicalized to an AND operation, but the theory remains the same].
> 
> Are you sure the reason is that not all targets support extension 
> instructions?
> I thought in practice this routine would only tend to see ZERO_EXTEND etc. if
> those codes appeared in the original rtl insns.

Excellent question.  My current understanding is that this subroutine is used
for both SIGN_EXTEND/ZERO_EXTEND (for which may processors have instructions
and even addressing mode support) and also SIGN_EXTRACT/ZERO_EXTRACT for
which many platforms, really do need a pair of shift instructions. (or an AND).

The bit (to me) that that's suspicious is the exact wording of the comment...

> >   /* Convert sign extension to zero extension, if we know that the high
> >      bit is not set, as this is easier to optimize.  It will be converted
> >      back to cheaper alternative in make_extraction.  */

Notice that things are converted back in "make_extraction", and may not be
getting converted back (based on empirical evidence) for non-extractions,
i.e. extensions.  This code is being called for ZERO_EXTEND on a path that
doesn't subsequently call make_compound.  As shown in the PR, there are
code generation differences (that impact performance), but I agree there's
some ambiguity around the intent of the original code.

My personal preference is to write backend patterns that contain ZERO_EXTEND
(or SIGN_EXTEND) rather than a pair of shifts, or an AND of a paradoxical 
SUBREG.
For example, the new patterns added to i386.md look (to me) "cleaner" than the
forms that they complement/replace.  The burden is then on the middle-end to
simplify {ZERO,SIGN}_EXTEND forms as efficiently as it would shifts or ANDs.

Interestingly, this is sometimes already the case, for example, we simplify
(ffs (zero_extend ...)) in cases where we wouldn't simplify the equivalent
(ffs (and ... 255)) [but these are perhaps also just missed optimizations].

Thoughts?  I'll also dig more into the history of these (combine) functions.

Cheers,
Roger
--


Reply via email to