Manolis Tsamis <manolis.tsa...@vrull.eu> writes:
> noce_convert_multiple_sets has been introduced and extended over time to 
> handle
> if conversion for blocks with multiple sets. Currently this is focused on
> register moves and rejects any sort of arithmetic operations.
>
> This series is an extension to allow more sequences to take part in if
> conversion. The first patch is a required change to emit correct code and the
> second patch whitelists a larger number of operations through
> bb_ok_for_noce_convert_multiple_sets.
>
> For targets that have a rich selection of conditional instructions,
> like aarch64, I have seen an ~5x increase of profitable if conversions for
> multiple set blocks in SPEC benchmarks. Also tested with a wide variety of
> benchmarks and I have not seen performance regressions on either x64 / 
> aarch64.

Interesting results.  Are you free to say which target you used for aarch64?

If I've understood the cost heuristics correctly, we'll allow a "predictable"
branch to be replaced by up to 5 simple conditional instructions and an
"unpredictable" branch to be replaced by up to 10 simple conditional
instructions.  That seems pretty high.  And I'm not sure how well we
guess predictability in the absence of real profile information.

So my gut instinct was that the limitations of the current code might
be saving us from overly generous limits.  It sounds from your results
like that might not be the case though.

Still, if it does turn out to be the case in future, I agree we should
fix the costs rather than hamstring the code.

> Some samples that previously resulted in a branch but now better use these
> instructions can be seen in the provided test case.
>
> Tested on aarch64 and x64; On x64 some tests that use __builtin_rint are
> failing with an ICE but I believe that it's not an issue of this change.
> force_operand crashes when (and:DF (not:DF (reg:DF 88)) (reg/v:DF 83 [ x ]))
> is provided through emit_conditional_move.

I guess that needs to be fixed first though.  (Thanks for checking both
targets.)

My main comments on the series are:

(1) It isn't obvious which operations should be included in the list
    in patch 2 and which shouldn't.  Also, the patch only checks the
    outermost operation, and so it allows the inner rtxes to be
    arbitrarily complex.

    Because of that, it might be better to remove the condition
    altogether and just rely on the other routines to do costing and
    correctness checks.

(2) Don't you also need to update the "rewiring" mechanism, to cope
    with cases where the then block has something like:

      if (a == 0) {
        a = b op c;       ->    a' = a == 0 ? b op c : a;
        d = a op b;       ->    d = a == 0 ? a' op b : d;
      }                         a = a'

    At the moment the code only handles regs and subregs, whereas but IIUC
    it should now iterate over all the regs in the SET_SRC.  And I suppose
    that creates the need for multiple possible rewirings in the same insn,
    so that it isn't a simple insn -> index mapping any more.

Thanks,
Richard

>
>
> Changes in v2:
>         - Change "conditional moves" to "conditional instructions"
>         in bb_ok_for_noce_convert_multiple_sets's comment.
>
> Manolis Tsamis (2):
>   ifcvt: handle sequences that clobber flags in
>     noce_convert_multiple_sets
>   ifcvt: Allow more operations in multiple set if conversion
>
>  gcc/ifcvt.cc                                  | 109 ++++++++++--------
>  .../aarch64/ifcvt_multiple_sets_arithm.c      |  67 +++++++++++
>  2 files changed, 127 insertions(+), 49 deletions(-)
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/ifcvt_multiple_sets_arithm.c

Reply via email to