On 04/10/2012 09:35 AM, Richard Sandiford wrote:
Hi Vlad,
Back in Decemember, when we were still very much in stage 3, I sent
an RFC about an alternative implementation of -fsched-pressure.
Just wanted to send a reminder now that we're in the proper stage:
http://gcc.gnu.org/ml/gcc-patches/2011-12/msg01684.html
Ulrich has benchmarked it on ARM, S/390 and Power7 (thanks), and got
reasonable results. (I mentioned bad Power 7 results in that message,
because of the way the VSX_REGS class is handled. Ulrich's results
are without -mvsx though.)
The condition I orignally set myself was that this patch should only
go in if it becomes the default on at least one architecture,
specifically ARM. Ulrich tells me that Linaro have now made it
the default for ARM in their GCC 4.7 release, so hopefully Ramana
would be OK with doing the same in upstream 4.8.
I realise the whole thing is probably more complicated and ad-hoc
than you'd like. Saying it can't go in is a perfectly acceptable
answer IMO.
I have a mixed feeling with the patch. I've tried it on SPEC2000 on
x86/x86-64 and ARM. Model algorithm generates bigger code up to 3.5%
(SPECFP on x86), 2% (SPECFP on 86-64), and 0.23% (SPECFP on ARM) in
comparison with the current algorithm. It is slower too. Although the
difference is quite insignificant on Corei7, compiler speed slowdown
achieves 0.4% on SPECFP2000 on arm. The algorithm also generates slower
code on x86 (1.5% on SPECINT and 5% on SPECFP200) and practically the
same average code on x86-64 and ARM (I've tried only SPECINT on ARM).
On the other hand, I don't think that 1st insn scheduling will be ever
used for x86. And although the SPECFP2000 rate is the same on x86-64 I
saw that some SPECFP2000 tests benefit from your algorithm on x86-64
(one amazing difference is 70% improvement on swim on x86-64 although it
might be because of different reasons like alignment or cache
behaviour). So I think the algorithm might work better on processors
with more registers.
I gues it is ok to sumbit this work to the trunk. It may be useful for
achieving better performance for specific tests or to make it a default
for some targets if it is proven to be better.
As for the patch itself, I think you should document the option in
doc/invoke.texi. It is missed. Another minor mistake I found is one
line garbage (I guess from -fira-algorithm) in description of
-fsched-pressure-algorithm in common.opt.
Thanks, Richard.
By the way, the code in scheduler has been changed since you made a
patch and you need to do some merging first.