Hi, > -----Original Message----- > From: Segher Boessenkool [mailto:seg...@kernel.crashing.org] > Sent: Saturday, March 14, 2020 12:07 AM > To: Yangfei (Felix) <felix.y...@huawei.com> > Cc: gcc-patches@gcc.gnu.org; Zhanghaijian (A) <z.zhanghaij...@huawei.com> > Subject: Re: [PATCH PR94026] combine missed opportunity to simplify > comparisons with zero > > On Fri, Mar 13, 2020 at 03:21:18AM +0000, Yangfei (Felix) wrote: > > > On Wed, Mar 04, 2020 at 08:39:36AM +0000, Yangfei (Felix) wrote: > > > > This is a simple fix for PR94026. > > > > With this fix, combine will try make an extraction if we are in > > > > a equality > > > comparison and this is an AND > > > > with a constant which is power of two minus one. Shift here > > > > should be an > > > constant. For example, combine > > > > will transform (compare (and (lshiftrt x 8) 6) 0) to (compare > > > > (zero_extract > > > (x 2 9)) 0). > > > > > > Why is that a good thing? > > > > The reported test case is reduced from spec2017 541.leela_r. I have pasted > original code snippet on the bugzilla. > > We found other compilers like aocc/llvm can catch this pattern and simplify > > it. > > That wasn't my question, let me rephrase: why would writing it as zero_extract > (instead of as a more canonical form) be wanted?
Sorry for not getting your point here. > The aarch backend only has zero_extract formulations for most of the bitfield > instructions. If you fix that problem, all of this should go away? Like, the > testcase in the PR starts with > > Trying 7 -> 8: > 7: r99:SI=r103:SI>>r104:SI#0 > REG_DEAD r104:SI > REG_DEAD r103:SI > 8: r100:SI=r99:SI&0x6 > REG_DEAD r99:SI > Failed to match this instruction: > (set (reg:SI 100) > (and:SI (ashiftrt:SI (reg:SI 103) > (subreg:QI (reg:SI 104) 0)) > (const_int 6 [0x6]))) > > and that should match already (that's an ubfm (ubfx))? For aarch64, if we use "ubfm/ubfx" for the reduced test case, then we still need to do a compare with zero. Then we won't get the benefit. For aarch64, we need to emit a "tst" instruction here. So we need to catch something like: 149 (set (reg:CC_NZ 66 cc) 150 (compare:CC_NZ (and:SI (lshiftrt:SI (reg:SI 102) 151 (const_int 8 [0x8])) 152 (const_int 6 [0x6])) 153 (const_int 0 [0]))) But this pattern is not accurate enough: we can only accept equality comparison with zero here (as indicated by the checking of equality_comparison in my original patch). Also, this issue is there for ports like x86. If we go that way, then we need to handle each port affected. So I am inclined to handle this in an arch-independent way. I looked into tree phases like fwprop & fold-const before, but didn't see an appropriate point to catch this opportunity. Then I came to the combine phase. > > > > (There should be thorough tests on many archs, showing it helps on > > > average, and it doesn't regress anything. I can do that for you, but not > right now). > > > > I only have aarch64 & x86_64 linux available and have tested this patch with > spec17 on both platforms. > > No obvious improvement & regression witnessed. This is expected as only > one instruction is reduced here. > > What should be tested is what new combinations are done, and which are *no > longer* done. In theory, we won't lose but emit more zero_extract with my patch. > > > In general, we should have *fewer* zero_extract, not more. > > Some reasons for that: > > 1) All those can be expressed with simpler operations as well; > 2) Most very similar expressions cannot be expressed as zero_extract, > although many architectures can handle (some of) those just fine; > 3) The optimizers do not handle zero_extract very well at all (this includes > simplify-rtx, to start with). > > sign_extract is nastier -- we really want to have a sign_extend that works on > separate bits, not as coarse as address units as we have now -- but it > currently > isn't handled much either. Thanks for explaining this. I have to admit that I didn't realize this issue when I was creating my original patch. Felix