On 2014-06-16, 10:14 AM, Ajit Kumar Agarwal wrote:
Hello All:

I have worked on the Open64 compiler where the Register Pressure Guided Unroll 
and Jam gave a good amount of performance improvement for the  C and C++ Spec 
Benchmark and also Fortran benchmarks.

The Unroll and Jam increases the register pressure in the Unrolled Loop leading 
to increase in the Spill and Fetch degrading the performance of the Unrolled 
Loop. The Performance of Cache locality achieved through Unroll and Jam is 
degraded with the presence of Spilling instruction due to increases in register 
pressure Its better to do the decision  of Unrolled Factor of the Loop based on 
the Performance model of the register pressure.

Most of the Loop Optimization Like Unroll and Jam is implemented in the High 
Level IR. The register pressure based Unroll and Jam requires the calculation 
of register pressure in the High Level IR  which will be similar to register 
pressure we calculate on Register Allocation. This makes the implementation 
complex.

To overcome this, the Open64 compiler does the decision of Unrolling to both 
High Level IR and also at the Code Generation Level. Some of the decisions way 
at the end of the Code Generation . The advantage of using this approach like 
Open64 helps in using the register pressure information calculated by the 
Register Allocator. This helps the implementation much simpler and less complex.

Can we have this approach in GCC of the Decisions of Unroll and Jam in the High 
Level IR  and also to defer some of the decision at the Code Generation Level 
like Open64?

  Please let me know what do you think.


Most loop optimizations are a good target for register pressure sensitive algorithms as loops are usually program hot spots and any pressure decrease there would be harmful as any RA can not undo such complex transformations.

So I guess your proposal could work. Right now we have only pressure-sensitive modulo scheduling (SMS) and loop-invariant motion (as I remember switching from loop-invariant motion based on some very inaccurate register-pressure evaluation to one based on RA pressure evaluation gave a nice improvement about 1% for SPECFP2000 on some targets).

Reply via email to