On 01/23/2016 06:09 AM, Ajit Kumar Agarwal wrote:
This patch improves the updated memory cost in coloring pass of integrated
register
allocator. Only enter_freq of the loop is considered in updated memory cost in
the
coloring pass. Consideration of only enter_freq is based on the concept that
live Out
of the entry or header of the Loop is live in and liveout throughout the loop.
Exit
freq is ignored in the update memory cost in coloring pass.
As we put stores for spilled pseudos on loop entry and loads on the loop
exits, ignoring loop exits means for me that we basically ignore the
cost of the loads which is probably wrong in a general case.
This increases the updated memory most and more chances of reducing the spill
and
fetch and better assignment.
The concept of live-out of the header of the loop is live-in and live-out
throughout
of the Loop is based on the following.
If a v live is out at the header of the loop then the variable is live-in at
every node
in the loop. To prove this, consider a loop L with header h such that the
variable v
defined at d is live-in at h. Since v is live at h, d is not part of L. This
follows
from the dominance property, i.e. h is strictly dominated by d. Furthermore,
there
exists a path from h to a use of v which does not go through d. For every node
p in
the loop, since the loop is strongly connected and node is a component of the
CFG,
there exists a path, consisting only of nodes of L from p to h. Concatenating
these
two paths proves that v is live-in and live-out of p.
Bootstrapped on X86_64.
Performance run is done on SPEC CPU2000 benchmarks and following are the
results.
SPEC INT benchmarks
(Mean Score with this patch vs Mean score without this patch = 3729.777 vs
3717.083).
Benchmarks Gains.
186.crafty = 2.78%
176.gcc = 0.7%
253.perlbmk = 0.75%
255.vortex = 0.82%
SPEC FP benchmarks
(Mean Score with this patch vs Mean score without this patch = 4774.65 vs
4751.838 ).
Benchmarks Gains
168.wupwise = 0.77%
171.swim = 1.5%
177.mesa = 1.2%
200.sixtrack = 1.2%
178.galgel = 0.6%
179.art = 0.6%
183.equake = 0.5%
187.facerec = 0.7%.
Thanks for trying to improve GCC performance, Ajit. Unfortunately, I
got different numbers on SPEC2000 with your patch. The different
results might be a consequence of different test setup.
I got the following numbers using 4.2GHz i7-4790K (Haswell) using -Ofast
-mtune=corei7. Using the tune option is important as RA will try to
improve code for Haswell architecture.
64-bit:
Int 5123 5126
FP 6886 6897
32-bit:
Int 4754 4763
FP 6363 6346
Here the first column is GCC with your patch and the second one is
without your patch. Only 32-bit FP score was improved by you patch.
These days practically nobody uses 32-bit code for FP benchmarks.
So unfortunately I can not approve the patch. Sorry.