On Wed, Sep 17, 2014 at 12:47 PM, Michael Meissner
<meiss...@linux.vnet.ibm.com> wrote:
> This patch is an intermediate step of what I want to do to improve power8
> fusion.
>
> In the current trunk, the fusion support for gpr loads is done by a peephole2
> to find the addis followed by the load instruction where the only consumer of
> the addis instruction is the load, and it rewrites the addis to use the
> register that will be loaded, and emits the two separate instructions.  There
> is then a normal peephole that recognizes the addis/load combination, and 
> makes
> sure they are emited together, along with a comment, to make tracking of the
> fusion attempts easier.  The problem is things like the second scheduler pass
> will move things around, and often times move the addis away from the load.
> This means the normal peephole pass won't see the two instructions.
>
> This patch creates a new insn that combines the two parts, so that the
> scheduler2 pass won't split up the two insns.  In doing static analysis, a lot
> more fused pairs are generated.  For instance, 400.perlbench generates more
> than 11,300 more load fusion with these patches, 403.gcc generates 23,000 more
> load fusions, and 416.gamess generates 39,000 more load fusions.
>
> However, when spec 2006 is run on a power8, you don't actually see much of a
> performance difference with these patches.  In digging into it, the main place
> where fusion occurs is in referencing static/global variables.  The spec 2006
> suite does not tend to have that much static/global data, so the linker
> optimizes most of the addis instructions to be nops, and the load index
> register is adjusted to use r2.  These optimizations should help much larger
> code bases that do have a lot more static/global data.
>
> I've done bootstraps on both a big endian power7 and a little endian power8
> with no regressions.  Are these patches ok to install in the trunk, and the
> 4.8/4.9 branches?
>
> 2014-09-16  Michael Meissner  <meiss...@linux.vnet.ibm.com>
>
>         * config/rs6000/predicates.md (fusion_gpr_mem_load): Move testing
>         for base_reg_operand to be common between LO_SUM and PLUS.
>         (fusion_gpr_mem_combo): New predicate to match a fused address
>         that combines the addis and memory offset address.
>
>         * config/rs6000/rs6000-protos.h (fusion_gpr_load_p): Change
>         calling signature.
>         (emit_fusion_gpr_load): Likewise.
>
>         * config/rs6000/rs6000.c (fusion_gpr_load_p): Change calling
>         signature to pass each argument separately, rather than
>         using an operands array.  Rewrite the insns found by peephole2 to
>         be a single insn, rather than hoping the insns will still be
>         together when the peephole pass is done.  Drop being called via a
>         normal peephole.
>         (emit_fusion_gpr_load): Change calling signature to be called from
>         the fusion_gpr_load_<mode> insns with a combined memory address
>         instead of the peephole pass passing the addis and offset
>         separately.
>
>         * config/rs6000/rs6000.md (UNSPEC_FUSION_GPR): New unspec for GPR
>         fusion.
>         (power8 fusion peephole): Drop support for doing power8 via a
>         normal peephole that was created by the peephole2 pass.
>         (power8 fusion peephole2): Create a new insn with the fused
>         address, so that the fused operation is kept together after
>         register allocation is done.
>         (fusion_gpr_load_<mode>): Likewise.

Okay.

Thanks, David

Reply via email to