Hi Richard,

Thanks for your insightful reply.

On Tue, Jul 18, 2023 at 1:12 AM Richard Sandiford
<richard.sandif...@arm.com> wrote:
>
> Manolis Tsamis <manolis.tsa...@vrull.eu> writes:
> > noce_convert_multiple_sets has been introduced and extended over time to 
> > handle
> > if conversion for blocks with multiple sets. Currently this is focused on
> > register moves and rejects any sort of arithmetic operations.
> >
> > This series is an extension to allow more sequences to take part in if
> > conversion. The first patch is a required change to emit correct code and 
> > the
> > second patch whitelists a larger number of operations through
> > bb_ok_for_noce_convert_multiple_sets.
> >
> > For targets that have a rich selection of conditional instructions,
> > like aarch64, I have seen an ~5x increase of profitable if conversions for
> > multiple set blocks in SPEC benchmarks. Also tested with a wide variety of
> > benchmarks and I have not seen performance regressions on either x64 / 
> > aarch64.
>
> Interesting results.  Are you free to say which target you used for aarch64?
>
> If I've understood the cost heuristics correctly, we'll allow a "predictable"
> branch to be replaced by up to 5 simple conditional instructions and an
> "unpredictable" branch to be replaced by up to 10 simple conditional
> instructions.  That seems pretty high.  And I'm not sure how well we
> guess predictability in the absence of real profile information.
>
> So my gut instinct was that the limitations of the current code might
> be saving us from overly generous limits.  It sounds from your results
> like that might not be the case though.
>
> Still, if it does turn out to be the case in future, I agree we should
> fix the costs rather than hamstring the code.
>

My writing may have been confusing, but with "~5x increase of
profitable if conversions" I just meant that ifcvt considers these
profitable, not that they actually are when executed in particular
hardware.
But at the same time I haven't yet seen any obvious performance
regressions in some benchmarks that I have ran.
In any case it could be interesting to microbenchmark branches vs
conditional instructions and see how sane these numbers are.

> > Some samples that previously resulted in a branch but now better use these
> > instructions can be seen in the provided test case.
> >
> > Tested on aarch64 and x64; On x64 some tests that use __builtin_rint are
> > failing with an ICE but I believe that it's not an issue of this change.
> > force_operand crashes when (and:DF (not:DF (reg:DF 88)) (reg/v:DF 83 [ x ]))
> > is provided through emit_conditional_move.
>
> I guess that needs to be fixed first though.  (Thanks for checking both
> targets.)
>

I get a feeling this may be fixed if I properly take care of your
points 1 & 2 below. I will report on that.

> My main comments on the series are:
>
> (1) It isn't obvious which operations should be included in the list
>     in patch 2 and which shouldn't.  Also, the patch only checks the
>     outermost operation, and so it allows the inner rtxes to be
>     arbitrarily complex.
>
>     Because of that, it might be better to remove the condition
>     altogether and just rely on the other routines to do costing and
>     correctness checks.
>

That is true; I wanted to somehow only allow "normal arithmetic
operations" and avoid generating sequences with stranger codes. I will
try and see what happens if I remove the condition altogether. I also
totally missed the fact that I was allowing arbitrarily complex inner
rtxes so thanks for pointing that out.

> (2) Don't you also need to update the "rewiring" mechanism, to cope
>     with cases where the then block has something like:
>
>       if (a == 0) {
>         a = b op c;       ->    a' = a == 0 ? b op c : a;
>         d = a op b;       ->    d = a == 0 ? a' op b : d;
>       }                         a = a'
>
>     At the moment the code only handles regs and subregs, whereas but IIUC
>     it should now iterate over all the regs in the SET_SRC.  And I suppose
>     that creates the need for multiple possible rewirings in the same insn,
>     so that it isn't a simple insn -> index mapping any more.
>

Indeed, I believe this current patch cannot properly handle these. I
will create testcases for this and see what changes need to be done in
the next iteration so that correct code is generated.

Thanks,
Manolis

> Thanks,
> Richard
>
> >
> >
> > Changes in v2:
> >         - Change "conditional moves" to "conditional instructions"
> >         in bb_ok_for_noce_convert_multiple_sets's comment.
> >
> > Manolis Tsamis (2):
> >   ifcvt: handle sequences that clobber flags in
> >     noce_convert_multiple_sets
> >   ifcvt: Allow more operations in multiple set if conversion
> >
> >  gcc/ifcvt.cc                                  | 109 ++++++++++--------
> >  .../aarch64/ifcvt_multiple_sets_arithm.c      |  67 +++++++++++
> >  2 files changed, 127 insertions(+), 49 deletions(-)
> >  create mode 100644 
> > gcc/testsuite/gcc.target/aarch64/ifcvt_multiple_sets_arithm.c

Reply via email to