On Mon, Mar 16, 2020 at 06:29:39AM +0000, Yangfei (Felix) wrote:
> Sorry for not getting your point here. 

Trying 7 -> 8:
    7: r99:SI=r103:SI>>0x8
      REG_DEAD r103:SI
    8: r100:SI=r99:SI&0x6
      REG_DEAD r99:SI
Failed to match this instruction:
(set (reg:SI 100)
    (and:SI (lshiftrt:SI (reg:SI 103)
            (const_int 8 [0x8]))
        (const_int 6 [0x6])))

That should match already, perhaps with a splitter.  aarch64 does not
have very generic rotate-and-mask (or -insert) instructions, so the
aarch64 backend needs to help combine with the less trivial cases.

If you have a splitter for *this* one, all else will probably work
"automatically": you split it to two ubfm, and the second of those can
then merge into the compare instruction, and everything works out.

> Also, this issue is there for ports like x86.  If we go that way, then we 
> need to handle each port affected.  

Yes, you need to do target-specific stuff in every backend separately.

> So I am inclined to handle this in an arch-independent way.  

But you don't.  The transformation you do looks to be actively harmful
on many targets.  (I haven't tested it yet, that takes 8h currently).

> > What should be tested is what new combinations are done, and which are *no
> > longer* done.
> 
> In theory, we won't lose but emit more zero_extract with my patch.  

Which is a bad idea.  How many machine insns are there matters; the cost
of those matters (assuming that matches reality well, which it has to
anyway); latency matters.  How many insns are zero_extract instead of
something else is not a good indicator of performance.


Segher

Reply via email to