> On 16 Sep 2017, at 1:04 AM, David Edelsohn <dje....@gmail.com> wrote:
> 
> On Tue, Sep 5, 2017 at 5:26 AM, Prathamesh Kulkarni
> <prathamesh.kulka...@linaro.org> wrote:
>> Hi,
>> I have attached revamped version of Kugan's original patch for type promotion
>> (https://gcc.gnu.org/ml/gcc-patches/2014-09/msg00472.html)
>> rebased on r249469. The motivation of the pass is to minimize
>> generation of subregs
>> to avoid redundant zero/sign extensions by carrying out computations
>> in PROMOTE_MODE
>> as much as possible on tree-level.
>> 
>> * Working of pass
>> The pass is dominator-based, and tries to promote type of def to
>> PROMOTE_MODE in each gimple stmt. Before beginning domwalk, all the
>> default definitions are promoted to PROMOTE_MODE
>> in promote_all_ssa_defined_with_nop (). The patch adds a new tree code
>> SEXT_EXPR to represent sign-extension on tree level. CONVERT_EXPR is
>> either replaced by an explicit sign or zero extension depending on the
>> signedness of operands.
>> 
>> The core of the pass is in following two routines:
>> a) promote_ssa: This pass looks at the def of gimple_stmt, and
>> promotes type of def to promoted_type in-place if possible. If it
>> cannot promote def in-place, then
>> it transforms:
>> def = def_stmt
>> to
>> new_def = def_stmt
>> def = convert_expr new_def
>> where new_def is a clone of def, and type of def is set to promoted_type.
>> 
>> b) fixup_use: The main intent is to "fix" uses of a promoted variable
>> to preserve semantics
>> of the code, for instance if the variable is used in stmt where it's
>> original type is required.
>> Another case is when def is not promoted by promote_ssa, but some uses
>> could be promoted.
>> 
>> promote_all_stmts () is the driver function that calls fixup_use and
>> promote_ssa for each stmt
>> within the basic block. The pass relies extensively on dom and vrp to
>> remove redundancies generated by the pass and is thus enabled only if
>> vrp is enabled.
>> 
>> Issues:
>> 1] Pass ordering: type-promote pass generates too many redundancies
>> which can hamper other optimizations. One case I observed was on arm
>> when it inhibited path splitting optimization because the number of
>> stmts in basic block exceeded the value of
>> param-max-jump-thread-duplication-stmts. So I placed the pass just
>> before dom. I am not sure if this is a good approach. Maybe the pass
>> itself
>> should avoid generating redundant statements and not rely too much on
>> dom and vrp ?
>> 
>> 2] Redundant copies left after the pass: When it's safe, vrp optimzies
>> _1 = op1 sext op2
>> into
>> _1 = op1
>> 
>> which leaves redundant copies that are not optimized away at GIMPLE level.
>> I placed pass_copy_prop after vrp to eliminate these copies but not sure if 
>> that
>> is ideal. Maybe we should do this during vrp itself as this is the
>> only case that
>> generates redundant copies ?
>> 
>> 3] Backward phi's: Since we traverse in dominated order, fixup_use()
>> gets called first on a ssa_name that is an argument of a backward-phi.
>> If it promotes the type and later if promote_ssa() decides that the
>> ssa_name should not be promoted, then we have to again walk the
>> backward phi's to undo the "incorrect" promotion, which is done by
>> emitting a convert_expr back to the original type from promoted_type.
>> While I suppose it shouldn't affect correctness, it generates
>> redundant casts and was wondering if there's a better approach to
>> handle this issue.
>> 
>> * SPEC2k6 benchmarking:
>> 
>> Results of  benchmarking the patch for aarch64-linux-gnu cortex-a57:
>> 
>> Improved:
>> 401.bzip2          +2.11%
>> 459.GemsFDTD  +1.42%
>> 464.h264ref       +1.96%
>> 471.omnetpp      +1.05%
>> 481.wrf                +0.99%
>> 
>> Regressed:
>> 447.dealII            -1.34%
>> 450.soplex          -1.54%
>> 456.hmmer          -3.79%
>> 482.sphinx3         -2.95%
>> 
>> The remaining benchmarks didn't have much difference. However there
>> was some noise
>> in the above run, and I suppose the numbers are not precise.
>> I will give another run with benchmarking. There was no significant
>> difference in code-size with and without patch for SPEC2k6.
>> 
>> * Validation
>> 
>> The patch passes bootstrap+test on x86_64-unknown-linux-gnu,
>> arm-linux-gnueabihf,
>> aarch64-linux-gnu and ppc64le-linux-gnu. On arm-linux-gnueabihf, and
>> aarch64-linux-gnu, there is following fallout:
>> 
>> 1] gcc.dg/attr-alloc_size-11.c missing range info for signed char
>> (test for warnings, line 50)
>> The issue seems to be that we call reset_flow_sensitive_info(def) if
>> def is promoted and that invalidates value range which is probably
>> causing the regression.
>> 
>> 2] gcc.dg/fold-narrowbopcst-1.c scan-tree-dump optimized " = _.* \\\\+ 156"
>> This requires adjusting the scan-dump.
>> 
>> On ppc64le-linux-gnu, I am observing several regressions in the
>> testsuite. Most of these seem to be because type-promotion is
>> interfering with other optimizations, especially widening_mul and
>> bswap pass. Also observed fallout for some cases due to interference
>> with strlen, tail-call and jump-threading passes. I suppose I am not
>> seeing these on arm/aarch64 because ppc64 defines
>> PROMOTE_MODE to be DImode and aarch64 defines it to be SImode ? I am
>> working to fix performance regressions. However I would be grateful to
>> receive feedback on the pass, if it's going in the right direction.
> 
> I’m glad that this type of optimization is being pursued.

Prathamesh, Thanks for posting the patch here. I’m not subscribed to 
gcc-patches. I will definitely test this patch on the RISC-V port.


> Why does AArch64 define PROMOTE_MODE as SImode?  GCC ports for other
> RISC targets mostly seem to use a 64-bit mode.  Maybe SImode is the
> correct definition based on the current GCC optimization
> infrastructure, but this seems like a change that should be applied to
> all 64 bit RISC targets.

RISC-V defines promote_mode on RV64 to promote SImode to signed DImode 
subregisters.  I did an experiment on RISC-V to not promote SImode to DImode 
and it improved codegen for many of my regression test cases, but unfortunately 
it breaks the RISC-V ABI.

The RISC-V ABI defines that 32-bit values are sign extended to the register 
width when being “passed” i.e. before calls or returns. I believe PROMOTE_MODE 
is part of the mechanism for ABI canonicalisation of 32-bit values in 64-bit 
registers.

#define PROMOTE_MODE(MODE, UNSIGNEDP, TYPE)     \
  if (GET_MODE_CLASS (MODE) == MODE_INT         \
      && GET_MODE_SIZE (MODE) < UNITS_PER_WORD) \
    {                                           \
      if ((MODE) == SImode)                     \
        (UNSIGNEDP) = 0;                        \
      (MODE) = word_mode;                       \
    }

On RV32 this promote_mode definition has no effect but on RV64 we end up with 
at lot of SImode registers as DImode subregisters in the RTL along with 
sign_extend inserted after any bitwise logical op instructions 
(and/ior/xor/not). We are seeing issues where the promotions are preventing 
elimination of sign_extend, coalescing of shifts and other optimisations (in 
combine, simplify-rtx and ree).

It does seem that a few GCC optimisations don't work quite as well when SImode 
is held in DImode subregisters, certainly on RISC-V, based on my experiments 
tinkering with promote_mode. Also RISC-V only has 64-bit versions of 
and/ior/xor/not, and these ops they have no lateral movement of bits hence they 
preserve sign-extended form when performed on subregisters that are already in 
canonical form, however GCC middle end doesn’t seem to now about this yet. I 
tried defining SImode versions of the bitwise logical ops however it didn’t 
appear to help. There are certainly issues where the use of subregisters appear 
to be preventing optimisations.

I’m still learning about GCC internals so I will study the patch and see how it 
impacts the RISC-V port. I would like to figure out how to teach the middle-end 
that DI mode and/ior/xor/not on already canonical SImode subregister inputs; 
don’t require sign_extend insertions, or that these sign_extend ops can be 
eliminated.

I’m not sure whether we should be solving the RISC-V issues in the middle-end 
or in riscv.md, however adding patterns in the metadata seems to me to be 
papering around issues in the core compiler.

Thanks,
Michael.

Reply via email to