Hi! On Mon, Feb 27, 2023 at 09:11:37AM -0600, Pat Haugen wrote: > The define_insns for the modulo operation currently force the target > register > to a distinct reg in preparation for a possible future peephole combining > div/mod. But this can lead to cases of a needless copy being inserted. Fixed > with the following patch.
Have you verified those peepholes still match? Do those peepholes actually improve performance? On new CPUs? The code here says ;; On machines with modulo support, do a combined div/mod the old fashioned ;; method, since the multiply/subtract is faster than doing the mod instruction ;; after a divide. but that really should not be true: we can do the div and mod in parallel (except in SMT4 perhaps, which we never schedule for anyway), so that should always be strictly faster. > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/mod-no_copy.c > @@ -0,0 +1,17 @@ > +/* { dg-do compile { target { powerpc*-*-* } } } */ All files in gcc.target/powerpc/ test for this already. Just leave off the target clause here? > +/* { dg-require-effective-target powerpc_p9modulo_ok } */ Leave out this line, because ... > +/* { dg-options "-mdejagnu-cpu=power9 -O2" } */ ... the -mcpu= forces it to true always. > +/* Verify r3 is used as source and target, no copy inserted. */ > +/* { dg-final { scan-assembler-not {\mmr\M} } } */ That is probably good enough, yeah, since the test results in only a handful of insns. Segher