On Fri, Mar 13, 2020 at 03:21:18AM +0000, Yangfei (Felix) wrote:
> > On Wed, Mar 04, 2020 at 08:39:36AM +0000, Yangfei (Felix) wrote:
> > > This is a simple fix for PR94026.
> > > With this fix, combine will try make an extraction if we are in a
> > > equality
> > comparison and this is an AND
> > > with a constant which is power of two minus one. Shift here should be
> > > an
> > constant. For example, combine
> > > will transform (compare (and (lshiftrt x 8) 6) 0) to (compare
> > > (zero_extract
> > (x 2 9)) 0).
> >
> > Why is that a good thing?
>
> The reported test case is reduced from spec2017 541.leela_r. I have pasted
> original code snippet on the bugzilla.
> We found other compilers like aocc/llvm can catch this pattern and simplify
> it.
That wasn't my question, let me rephrase: why would writing it as
zero_extract (instead of as a more canonical form) be wanted?
The aarch backend only has zero_extract formulations for most of the
bitfield instructions. If you fix that problem, all of this should go
away? Like, the testcase in the PR starts with
Trying 7 -> 8:
7: r99:SI=r103:SI>>r104:SI#0
REG_DEAD r104:SI
REG_DEAD r103:SI
8: r100:SI=r99:SI&0x6
REG_DEAD r99:SI
Failed to match this instruction:
(set (reg:SI 100)
(and:SI (ashiftrt:SI (reg:SI 103)
(subreg:QI (reg:SI 104) 0))
(const_int 6 [0x6])))
and that should match already (that's an ubfm (ubfx))?
> > (There should be thorough tests on many archs, showing it helps on average,
> > and it doesn't regress anything. I can do that for you, but not right now).
>
> I only have aarch64 & x86_64 linux available and have tested this patch with
> spec17 on both platforms.
> No obvious improvement & regression witnessed. This is expected as only one
> instruction is reduced here.
What should be tested is what new combinations are done, and which are
*no longer* done.
> > In general, we should have *fewer* zero_extract, not more.
Some reasons for that:
1) All those can be expressed with simpler operations as well;
2) Most very similar expressions cannot be expressed as zero_extract,
although many architectures can handle (some of) those just fine;
3) The optimizers do not handle zero_extract very well at all (this
includes simplify-rtx, to start with).
sign_extract is nastier -- we really want to have a sign_extend that
works on separate bits, not as coarse as address units as we have now --
but it currently isn't handled much either.
Segher