On Wed, Sep 17, 2014 at 12:47 PM, Michael Meissner <meiss...@linux.vnet.ibm.com> wrote: > This patch is an intermediate step of what I want to do to improve power8 > fusion. > > In the current trunk, the fusion support for gpr loads is done by a peephole2 > to find the addis followed by the load instruction where the only consumer of > the addis instruction is the load, and it rewrites the addis to use the > register that will be loaded, and emits the two separate instructions. There > is then a normal peephole that recognizes the addis/load combination, and > makes > sure they are emited together, along with a comment, to make tracking of the > fusion attempts easier. The problem is things like the second scheduler pass > will move things around, and often times move the addis away from the load. > This means the normal peephole pass won't see the two instructions. > > This patch creates a new insn that combines the two parts, so that the > scheduler2 pass won't split up the two insns. In doing static analysis, a lot > more fused pairs are generated. For instance, 400.perlbench generates more > than 11,300 more load fusion with these patches, 403.gcc generates 23,000 more > load fusions, and 416.gamess generates 39,000 more load fusions. > > However, when spec 2006 is run on a power8, you don't actually see much of a > performance difference with these patches. In digging into it, the main place > where fusion occurs is in referencing static/global variables. The spec 2006 > suite does not tend to have that much static/global data, so the linker > optimizes most of the addis instructions to be nops, and the load index > register is adjusted to use r2. These optimizations should help much larger > code bases that do have a lot more static/global data. > > I've done bootstraps on both a big endian power7 and a little endian power8 > with no regressions. Are these patches ok to install in the trunk, and the > 4.8/4.9 branches? > > 2014-09-16 Michael Meissner <meiss...@linux.vnet.ibm.com> > > * config/rs6000/predicates.md (fusion_gpr_mem_load): Move testing > for base_reg_operand to be common between LO_SUM and PLUS. > (fusion_gpr_mem_combo): New predicate to match a fused address > that combines the addis and memory offset address. > > * config/rs6000/rs6000-protos.h (fusion_gpr_load_p): Change > calling signature. > (emit_fusion_gpr_load): Likewise. > > * config/rs6000/rs6000.c (fusion_gpr_load_p): Change calling > signature to pass each argument separately, rather than > using an operands array. Rewrite the insns found by peephole2 to > be a single insn, rather than hoping the insns will still be > together when the peephole pass is done. Drop being called via a > normal peephole. > (emit_fusion_gpr_load): Change calling signature to be called from > the fusion_gpr_load_<mode> insns with a combined memory address > instead of the peephole pass passing the addis and offset > separately. > > * config/rs6000/rs6000.md (UNSPEC_FUSION_GPR): New unspec for GPR > fusion. > (power8 fusion peephole): Drop support for doing power8 via a > normal peephole that was created by the peephole2 pass. > (power8 fusion peephole2): Create a new insn with the fused > address, so that the fused operation is kept together after > register allocation is done. > (fusion_gpr_load_<mode>): Likewise.
Okay. Thanks, David