On 01/23/2016 06:09 AM, Ajit Kumar Agarwal wrote:
This patch improves the updated memory cost in coloring pass of integrated 
register
allocator. Only enter_freq of the loop is considered in updated memory cost in 
the
coloring pass. Consideration of only enter_freq is based on the concept that 
live Out
of the entry or header of the Loop is live in and liveout throughout the loop. 
Exit
freq is ignored in the update memory cost in coloring pass.
As we put stores for spilled pseudos on loop entry and loads on the loop exits, ignoring loop exits means for me that we basically ignore the cost of the loads which is probably wrong in a general case.
This increases the updated memory most and more chances of reducing the spill 
and
fetch and better assignment.

The concept of live-out of the header of the loop is live-in and live-out 
throughout
of the Loop is based on the following.

If a v live is out at the header of the loop then the variable is live-in at 
every node
in the loop. To prove this, consider a loop L with header h such that the 
variable v
defined at d is live-in at h. Since v is live at h, d is not part of L. This 
follows
from the dominance property, i.e. h is strictly dominated by d. Furthermore, 
there
exists a path from h to a use of v which does not go through d. For every node 
p in
the loop, since the loop is strongly connected and node is a component of the 
CFG,
there exists a path, consisting only of nodes of L from p to h. Concatenating 
these
two paths proves that v is live-in and live-out of p.

Bootstrapped on X86_64.

Performance run is done on SPEC CPU2000 benchmarks and following are the 
results.

SPEC INT benchmarks
(Mean Score with this patch vs Mean score without this patch = 3729.777 vs 
3717.083).

Benchmarks    Gains.
186.crafty   = 2.78%
176.gcc         = 0.7%
253.perlbmk = 0.75%
255.vortex    =  0.82%

SPEC FP benchmarks
(Mean Score with this patch vs Mean score without this patch = 4774.65  vs 
4751.838 ).

Benchmarks  Gains

168.wupwise = 0.77%
171.swim        = 1.5%
177.mesa        = 1.2%
200.sixtrack    = 1.2%
178.galgel        = 0.6%
179.art             = 0.6%
183.equake   = 0.5%
187.facerec   = 0.7%.

Thanks for trying to improve GCC performance, Ajit. Unfortunately, I got different numbers on SPEC2000 with your patch. The different results might be a consequence of different test setup.

I got the following numbers using 4.2GHz i7-4790K (Haswell) using -Ofast -mtune=corei7. Using the tune option is important as RA will try to improve code for Haswell architecture.

64-bit:
Int 5123 5126
FP 6886 6897

32-bit:
Int 4754 4763
FP 6363 6346

Here the first column is GCC with your patch and the second one is without your patch. Only 32-bit FP score was improved by you patch. These days practically nobody uses 32-bit code for FP benchmarks.

So unfortunately I can not approve the patch.  Sorry.

Reply via email to