Hi All, James and I have been investigating this regression and have tracked it down to register allocation.
I have create a new PR with our findings https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98782 but unfortunately we don't know how to proceed. This does seem like a genuine bug in RA. It looks like some magic threshold has been crossed, but we're having trouble determining what this magic number is. Any help is appreciated. Thanks, Tamar > -----Original Message----- > From: Xionghu Luo <luo...@linux.ibm.com> > Sent: Friday, October 16, 2020 9:47 AM > To: Tamar Christina <tamar.christ...@arm.com>; Martin Jambor > <mjam...@suse.cz>; Richard Sandiford <richard.sandif...@arm.com>; > luoxhu via Gcc-patches <gcc-patches@gcc.gnu.org> > Cc: seg...@kernel.crashing.org; wschm...@linux.ibm.com; > li...@gcc.gnu.org; Jan Hubicka <hubi...@ucw.cz>; dje....@gmail.com > Subject: Re: [PATCH] ipa-inline: Improve growth accumulation for recursive > calls > > > > On 2020/9/12 01:36, Tamar Christina wrote: > > Hi Martin, > > > >> > >> can you please confirm that the difference between these two is all > >> due to the last option -fno-inline-functions-called-once ? Is LTo > >> necessary? I.e., can you run the benchmark also built with the > >> branch compiler and -mcpu=native -Ofast -fomit-frame-pointer -fno- > inline-functions-called-once ? > >> > > > > Done, see below. > > > >>> +----------+-------------------------------------------------------- > >>> +----------+---------------------- > >> --------------------------------------------------------------------+--------------+--+- > -+ > >>> | Branch | -mcpu=native -Ofast -fomit-frame-pointer -flto > >> | -24% | | | > >>> +----------+-------------------------------------------------------- > >>> +----------+---------------------- > >> --------------------------------------------------------------------+--------------+--+- > -+ > >>> | Branch | -mcpu=native -Ofast -fomit-frame-pointer > >> | -26% | | | > >>> +----------+-------------------------------------------------------- > >>> +----------+---------------------- > >> --------------------------------------------------------------------+--------------+--+- > -+ > >> > >>> > >>> (Hopefully the table shows up correct) > >> > >> it does show OK for me, thanks. > >> > >>> > >>> It looks like your patch definitely does improve the basic cases. So > >>> there's not much difference between lto and non-lto anymore and it's > >> much Better than GCC 10. However it still contains the regression > >> introduced by Honza's changes. > >> > >> I assume these are rates, not times, so negative means bad. But do I > >> understand it correctly that you're comparing against GCC 10 with the > >> two parameters set to rather special values? Because your table > >> seems to indicate that even for you, the branch is faster than GCC 10 > >> with just - mcpu=native -Ofast -fomit-frame-pointer. > > > > Yes these are indeed rates, and indeed I am comparing against the same > > options we used to get the fastest rates on before which is the two > > parameters and the inline flag. > > > >> > >> So is the problem that the best obtainable run-time, even with > >> obscure options, from the branch is slower than the best obtainable > >> run-time from GCC 10? > >> > > > > Yeah that's the problem, when we compare the two we're still behind. > > > > I've done the additional two runs > > > > +----------+------------------------------------------------------------------------------ > --------------------------------------------------------------------+--------------+ > > | Compiler | Flags > | diff GCC 10 | > > +----------+------------------------------------------------------------------------------ > --------------------------------------------------------------------+--------------+ > > | GCC 10 | -mcpu=native -Ofast -fomit-frame-pointer -flto --param ipa-cp- > eval-threshold=1 --param ipa-cp-unit-growth=80 -fno-inline-functions- > called-once | | > > +----------+------------------------------------------------------------------------------ > --------------------------------------------------------------------+--------------+ > > | GCC 10 | -mcpu=native -Ofast -fomit-frame-pointer > | -44% | > > +----------+------------------------------------------------------------------------------ > --------------------------------------------------------------------+--------------+ > > | GCC 10 | -mcpu=native -Ofast -fomit-frame-pointer -flto > | -36% | > > +----------+------------------------------------------------------------------------------ > --------------------------------------------------------------------+--------------+ > > | GCC 11 | -mcpu=native -Ofast -fomit-frame-pointer -flto --param ipa-cp- > eval-threshold=1 --param ipa-cp-unit-growth=80 -fno-inline-functions- > called-once | -12% | > > +----------+------------------------------------------------------------------------------ > --------------------------------------------------------------------+--------------+ > > | Branch | -mcpu=native -Ofast -fomit-frame-pointer -flto --param ipa-cp- > eval-threshold=1 --param ipa-cp-unit-growth=80 > | -22% > | > > +----------+------------------------------------------------------------------------------ > --------------------------------------------------------------------+--------------+ > > | Branch | -mcpu=native -Ofast -fomit-frame-pointer -flto --param ipa-cp- > eval-threshold=1 --param ipa-cp-unit-growth=80 -fno-inline-functions- > called-once | -12% | > > +----------+------------------------------------------------------------------------------ > --------------------------------------------------------------------+--------------+ > > | Branch | -mcpu=native -Ofast -fomit-frame-pointer -flto > | -24% | > > +----------+------------------------------------------------------------------------------ > --------------------------------------------------------------------+--------------+ > > | Branch | -mcpu=native -Ofast -fomit-frame-pointer > | -26% | > > +----------+------------------------------------------------------------------------------ > --------------------------------------------------------------------+--------------+ > > | Branch | -mcpu=native -Ofast -fomit-frame-pointer -flto -fno-inline- > functions-called-once > | -12% | > > +----------+------------------------------------------------------------------------------ > --------------------------------------------------------------------+--------------+ > > | Branch | -mcpu=native -Ofast -fomit-frame-pointer -fno-inline- > functions-called-once > | -11% | > > +----------+------------------------------------------------------------------------------ > --------------------------------------------------------------------+--------------+ > > > > And this confirms that indeed LTO isn't needed and that the branch > > without any options is indeed much better than it was on GCC 10 without > any options. > > > > It also confirms that the only remaining difference is in the > > -fno-inline-functions-called-once > > If -fno-inline-functions-called-once is added, the recursive call function > digits_2 won't be inlined, as each digits_2 is specialized to clone nodes and > called once only, so performance back is expected, I guess it is somewhat > similar to -fno-inline for this case. > > @Jambor @Honza Any progress about this (--param controlling maximal > recursion depth) and the other regression about > LOOP_GUARD_WITH_PREDICTION in > PR96825(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96825) please? :) I > tested the current master FSF code, the regression still exists... > > > Thanks, > Xionghu >