On Thu, Sep 03, 2015 at 03:59:00PM +0100, Wilco Dijkstra wrote:
> > > However there are 2 issues with this, one is the spurious subreg,
> > 
> > Combine didn't make that up out of thin air; something already used
> > DImode here.  It could simplify it to SImode in this case, that is
> > true, don't know why it doesn't; it isn't necessarily faster code to
> > do so, it can be slower, it might not match, etc.
> 
> The relevant RTL instructions on AArch64 are:

[ You never gave a full  test case, or I missed it, or cannot find it
  anymore -- but I can reproduce this now:

void g(void);
void f(int *x) { if (*x & 2) g(); }

]

> (insn 8 3 25 2 (set (reg:SI 77 [ D.2705 ])
>         (and:SI (reg/v:SI 76 [ xD.2641 ])
>             (const_int 2 [0x2]))) tmp5.c:122 452 {andsi3}
>      (nil))
>  (insn 26 25 27 2 (set (reg:CC 66 cc)
>         (compare:CC (reg:SI 77 [ D.2705 ])
>             (const_int 0 [0]))) tmp5.c:122 377 {*cmpsi}
>      (expr_list:REG_DEAD (reg:SI 77 [ D.2705 ])
>         (nil)))
> 
> I don't see anything using DI...

Yeah, I spoke too soon, sorry.  It looks like make_compound_operation came
up with it.

> > It's only a problem for AND-and-compare, no?
> 
> Yes, so it looks like some other backends match the odd pattern and then have 
> another
> pattern change it back into the canonical AND/TST form during the split phase 
> (maybe
> the subreg confuses register allocation or block other optimizations).

A subreg of a pseudo is not anything special, don't worry about it,
register_operand and similar treat it just like any other register.

> This all seems
> a lot of unnecessary complexity for a few special immediates when there is a 
> much 
> simpler solution...

Feel free to post a patch!  I would love to have this all simplified.

> > > But there are more efficient ways to emit single bit and masks tests that 
> > > apply
> > > to most CPUs rather than doing something specific that works for just one 
> > > target
> > > only. For example single bit test is a simple shift into carry flag or 
> > > into the
> > > sign bit, and for mask tests, you shift out all the non-mask bits.
> > 
> > Most of those are quite target-specific.  Some others are already done,
> > and/or done by other passes.
> 
> But what combine does here is even more target-specific.

Combine puts everything (well, most things) through
make_compound_operation, on all targets.

> > Combine converts the merged instructions to what it thinks is the
> > canonical or cheapest form, and uses that.  It does not try multiple
> > options (the zero_ext* -> and+shift rewriting is not changing the
> > semantics of the pattern at all).
> 
> But the change from AND to zero_extract is already changing semantics...

Oh?  It is not supposed to!

> > > Or would it be better to let each target decide
> > > on how to canonicalize bit tests and only try that alternative?
> > 
> > The question is how to write the pattern to be most convenient for all
> > targets.
> 
> The obvious choice is to try the 2 original instructions merged.

... without any simplification.  Yes, I've wanted combine to fall back
to that if the "simplified" version does not work out.  Not so easy to
do though.

> > > Yes, but that doesn't mean (x & C) != 0 shouldn't be tried as well...
> > 
> > Combine does not try multiple options.
> 
> I'm not following - combine tries zero_extract and shift+AND - that's 2 
> options.
> If that is feasible then adding a 3rd option should be possible.

The shift+and is *exactly the same* as the zero_extract, just written
differently.

> We certainly need a lot more target hooks in general so GCC can do the right 
> thing
> (rather than using costs inconsistently all over the place). But that's a 
> different
> discussion...

This isn't about costs though.  That is a big other can of worms, indeed!


Anyway.  In that testcase I made, everything is simplified just fine on
aarch64, using *tbeqdi1; what am I missing?


Segher

Reply via email to