On Mon, Mar 16, 2020 at 06:29:39AM +0000, Yangfei (Felix) wrote: > Sorry for not getting your point here.
Trying 7 -> 8: 7: r99:SI=r103:SI>>0x8 REG_DEAD r103:SI 8: r100:SI=r99:SI&0x6 REG_DEAD r99:SI Failed to match this instruction: (set (reg:SI 100) (and:SI (lshiftrt:SI (reg:SI 103) (const_int 8 [0x8])) (const_int 6 [0x6]))) That should match already, perhaps with a splitter. aarch64 does not have very generic rotate-and-mask (or -insert) instructions, so the aarch64 backend needs to help combine with the less trivial cases. If you have a splitter for *this* one, all else will probably work "automatically": you split it to two ubfm, and the second of those can then merge into the compare instruction, and everything works out. > Also, this issue is there for ports like x86. If we go that way, then we > need to handle each port affected. Yes, you need to do target-specific stuff in every backend separately. > So I am inclined to handle this in an arch-independent way. But you don't. The transformation you do looks to be actively harmful on many targets. (I haven't tested it yet, that takes 8h currently). > > What should be tested is what new combinations are done, and which are *no > > longer* done. > > In theory, we won't lose but emit more zero_extract with my patch. Which is a bad idea. How many machine insns are there matters; the cost of those matters (assuming that matches reality well, which it has to anyway); latency matters. How many insns are zero_extract instead of something else is not a good indicator of performance. Segher