On Thu, Sep 03, 2015 at 03:59:00PM +0100, Wilco Dijkstra wrote: > > > However there are 2 issues with this, one is the spurious subreg, > > > > Combine didn't make that up out of thin air; something already used > > DImode here. It could simplify it to SImode in this case, that is > > true, don't know why it doesn't; it isn't necessarily faster code to > > do so, it can be slower, it might not match, etc. > > The relevant RTL instructions on AArch64 are:
[ You never gave a full test case, or I missed it, or cannot find it anymore -- but I can reproduce this now: void g(void); void f(int *x) { if (*x & 2) g(); } ] > (insn 8 3 25 2 (set (reg:SI 77 [ D.2705 ]) > (and:SI (reg/v:SI 76 [ xD.2641 ]) > (const_int 2 [0x2]))) tmp5.c:122 452 {andsi3} > (nil)) > (insn 26 25 27 2 (set (reg:CC 66 cc) > (compare:CC (reg:SI 77 [ D.2705 ]) > (const_int 0 [0]))) tmp5.c:122 377 {*cmpsi} > (expr_list:REG_DEAD (reg:SI 77 [ D.2705 ]) > (nil))) > > I don't see anything using DI... Yeah, I spoke too soon, sorry. It looks like make_compound_operation came up with it. > > It's only a problem for AND-and-compare, no? > > Yes, so it looks like some other backends match the odd pattern and then have > another > pattern change it back into the canonical AND/TST form during the split phase > (maybe > the subreg confuses register allocation or block other optimizations). A subreg of a pseudo is not anything special, don't worry about it, register_operand and similar treat it just like any other register. > This all seems > a lot of unnecessary complexity for a few special immediates when there is a > much > simpler solution... Feel free to post a patch! I would love to have this all simplified. > > > But there are more efficient ways to emit single bit and masks tests that > > > apply > > > to most CPUs rather than doing something specific that works for just one > > > target > > > only. For example single bit test is a simple shift into carry flag or > > > into the > > > sign bit, and for mask tests, you shift out all the non-mask bits. > > > > Most of those are quite target-specific. Some others are already done, > > and/or done by other passes. > > But what combine does here is even more target-specific. Combine puts everything (well, most things) through make_compound_operation, on all targets. > > Combine converts the merged instructions to what it thinks is the > > canonical or cheapest form, and uses that. It does not try multiple > > options (the zero_ext* -> and+shift rewriting is not changing the > > semantics of the pattern at all). > > But the change from AND to zero_extract is already changing semantics... Oh? It is not supposed to! > > > Or would it be better to let each target decide > > > on how to canonicalize bit tests and only try that alternative? > > > > The question is how to write the pattern to be most convenient for all > > targets. > > The obvious choice is to try the 2 original instructions merged. ... without any simplification. Yes, I've wanted combine to fall back to that if the "simplified" version does not work out. Not so easy to do though. > > > Yes, but that doesn't mean (x & C) != 0 shouldn't be tried as well... > > > > Combine does not try multiple options. > > I'm not following - combine tries zero_extract and shift+AND - that's 2 > options. > If that is feasible then adding a 3rd option should be possible. The shift+and is *exactly the same* as the zero_extract, just written differently. > We certainly need a lot more target hooks in general so GCC can do the right > thing > (rather than using costs inconsistently all over the place). But that's a > different > discussion... This isn't about costs though. That is a big other can of worms, indeed! Anyway. In that testcase I made, everything is simplified just fine on aarch64, using *tbeqdi1; what am I missing? Segher