Jiong Wang writes: > 2015-04-14 18:24 GMT+01:00 Jeff Law <l...@redhat.com>: >> On 04/14/2015 10:48 AM, Steven Bosscher wrote: >>>> >>>> So I think this stage2/3 binary difference is acceptable? >>> >>> >>> No, they should be identical. If there's a difference, then there's a >>> bug - which, it seems, you've already found, too. >> >> RIght. And so the natural question is how to fix. >> >> At first glance it would seem like having this new code ignore dependencies >> rising from debug insns would work. >> >> Which then begs the question, what happens to the debug insn -- it's >> certainly not going to be correct anymore if the transformation is made. > > Exactly. > > The debug_insn 2776 in my example is to record the base address of a > local array. the new code is doing correctly here by not shuffling the > operands of insn 2556 and 2557 as there is additional reference of > reg:1473 from debug insn, although the code will still execute correctly > if we do the transformation. > > my understanding to fix this: > > * delete the out-of-date mismatch debug_insn? as there is no guarantee > to generate accurate debug info under -O2. > > IMO, this debug_insn may affect "DW_AT_location" field for variable > descrption of "classes" in .debug_info section, but it's omitted in > the final output already. > > <3><38a4d>: Abbrev Number: 137 (DW_TAG_variable) > <38a4f> DW_AT_name : (indirect string, offset: 0x18db): classes > <38a53> DW_AT_decl_file : 1 > <38a54> DW_AT_decl_line : 548 > <38a56> DW_AT_type : <0x38cb4> > > * update the debug_insn? if the following change is OK with dwarf standard > > from > > insn0: reg0 = fp + reg1 > debug_insn: var_loc = reg0 + const_off > insn1: reg2 = reg0 + const_off > > to > > insn0: reg0 = fp + const_off > debug_insn: var_loc = reg0 + reg1 > insn1: reg2 = reg0 + reg1 > > Thanks, >
And attachment is the new patch which will update debug_insn as described in the second solution above. Now the stage2/3 binary differences on AArch64 gone away. Bootstrap OK. On AArch64, this patch give 600+ new rtl loop invariants found across spec2k6 float. +4.5% perf improvement on 436.cactusADM because four new invariants found in the critical function "regex_compile". The similar improvements may be achieved on other RISC backends like powerpc/mips I guess. One thing to mention, for AArch64, one minor glitch in aarch64_legitimize_address needs to be fixed to let this patch take effect, I will send out that patch later as it's a seperate issue. Powerpc/Mips don't have this glitch in LEGITIMIZE_ADDRESS hook, so should be OK, and I verified the base address of local array in the testcase given by Seb on pr62173 do hoisted on ppc64 now. I think pr62173 is fixed on those 64bit arch by this patch. Thoughts? Thanks. 2015-04-21 Jiong Wang <jiong.w...@arm.com> gcc/ * loop-invariant.c (find_defs): Enable DF_DU_CHAIN build. (vfp_const_iv): New hash table. (expensive_addr_check_p): New boolean. (init_inv_motion_data): Initialize new variables.> (free_inv_motion_data): Release hash table. (create_new_invariant): Set cheap_address to false for iv in vfp_const_iv table. (find_invariant_insn): Skip dependencies check for iv in vfp_const_iv table. (use_for_single_du): New function. (reshuffle_insn_with_vfp): Likewise. (find_invariants_bb): Call reshuffle_insn_with_vfp. gcc/testsuite/ * gcc.dg/pr62173.c: New testcase. -- Regards, Jiong
diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c index f79b497..f70dfb0 100644 --- a/gcc/loop-invariant.c +++ b/gcc/loop-invariant.c @@ -203,6 +203,8 @@ typedef struct invariant *invariant_p; /* The invariants. */ static vec<invariant_p> invariants; +static hash_table <pointer_hash <rtx_insn> > *vfp_const_iv; +static bool need_expensive_addr_check_p; /* Check the size of the invariant table and realloc if necessary. */ @@ -695,7 +697,7 @@ find_defs (struct loop *loop) df_remove_problem (df_chain); df_process_deferred_rescans (); - df_chain_add_problem (DF_UD_CHAIN); + df_chain_add_problem (DF_UD_CHAIN + DF_DU_CHAIN); df_set_flags (DF_RD_PRUNE_DEAD_DEFS); df_analyze_loop (loop); check_invariant_table_size (); @@ -742,6 +744,9 @@ create_new_invariant (struct def *def, rtx_insn *insn, bitmap depends_on, See http://gcc.gnu.org/ml/gcc-patches/2009-10/msg01210.html . */ inv->cheap_address = address_cost (SET_SRC (set), word_mode, ADDR_SPACE_GENERIC, speed) < 3; + + if (need_expensive_addr_check_p && vfp_const_iv->find (insn)) + inv->cheap_address = false; } else { @@ -952,7 +957,8 @@ find_invariant_insn (rtx_insn *insn, bool always_reached, bool always_executed) return; depends_on = BITMAP_ALLOC (NULL); - if (!check_dependencies (insn, depends_on)) + if (!vfp_const_iv->find (insn) + && !check_dependencies (insn, depends_on)) { BITMAP_FREE (depends_on); return; @@ -1007,6 +1013,180 @@ find_invariants_insn (rtx_insn *insn, bool always_reached, bool always_executed) record_uses (insn); } +/* Find the single use of def of insn. At the same time, + make sure def of insn is the single def which reach the use. + NOTE: debug_insn can affect DF, var_location debug insn may reference + the same rtl expr as variable location, such debug_insn should not + affect the insn shuffling while we do need to update the loc of the + debug_insn after insn shuffling. So here we will record such debug_insn. */ + +static rtx_insn * +use_for_single_du (rtx_insn *insn, rtx_insn **debug_insn, rtx *debug_expr_loc) +{ + struct df_insn_info *insn_info = DF_INSN_INFO_GET (insn); + df_ref def; + + FOR_EACH_INSN_INFO_DEF (def, insn_info) + { + struct df_link *chain = DF_REF_CHAIN (def); + + if (!chain) + return NULL; + + /* For multi use, only allow the second use be debug_insn. */ + if (chain->next + && (chain->next->next + || NONDEBUG_INSN_P (DF_REF_INSN (chain->next->ref)))) + return NULL; + + if (chain->next) + { + rtx_insn *insn = DF_REF_INSN (chain->next->ref); + rtx debug_pat = PATTERN (insn); + if (GET_CODE (debug_pat) != VAR_LOCATION) + return NULL; + else + { + *debug_insn = insn; + *debug_expr_loc = PAT_VAR_LOCATION_LOC (debug_pat); + } + } + + df_ref use = chain->ref; + chain = DF_REF_CHAIN (use); + + if (!chain + || chain->next) + return NULL; + + return DF_REF_INSN (use); + } + + return NULL; +} + +/* This function also try to transform + + RA <- fixed_reg + RC + RD <- MEM (RA + const_offset) + + into: + + RA <- fixed_reg + const_offset + RD <- MEM (RA + RC) <- pos0 + + If use of RA in the second insn is the single use, and the define of + RA in the first insn is the single def reach the second insn. + + After this change, the first instruction is loop invariant. */ + +static void +reshuffle_insn_with_vfp (rtx_insn *insn) +{ + rtx set = single_set (insn); + + if (!set + || GET_CODE (SET_SRC (set)) != PLUS) + return; + + rtx dest = SET_DEST (set); + rtx op0 = XEXP (SET_SRC (set), 0); + rtx op1 = XEXP (SET_SRC (set), 1); + rtx non_vfp_op = op1; + + if (op1 == frame_pointer_rtx + || op1 == stack_pointer_rtx + || op1 == virtual_stack_vars_rtx) + { + non_vfp_op = op0; + std::swap (op0, op1); + } + + if (GET_CODE (op1) == SIGN_EXTEND + || GET_CODE (op1) == ZERO_EXTEND) + op1 = XEXP (op1, 0); + + if (!(REG_P (dest) && REG_P (op0) && REG_P (op1) + && (op0 == frame_pointer_rtx + || op0 == stack_pointer_rtx + || op0 == virtual_stack_vars_rtx))) + return; + + rtx_insn *use_insn, *debug_insn = NULL; + rtx debug_expr_loc; + + if (!(use_insn = use_for_single_du (insn, &debug_insn, &debug_expr_loc))) + return; + + rtx u_set = single_set (use_insn); + + if (!(u_set && (REG_P (SET_DEST (u_set)) || MEM_P (SET_DEST (u_set))))) + return; + + rtx mem_addr; + + if (REG_P (SET_DEST (u_set))) + mem_addr = SET_SRC (u_set); + else + mem_addr = SET_DEST (u_set); + + if (debug_insn && !rtx_equal_p (debug_expr_loc, mem_addr)) + return; + + if (GET_CODE (mem_addr) == ZERO_EXTEND + || GET_CODE (mem_addr) == SIGN_EXTEND + || GET_CODE (mem_addr) == TRUNCATE) + mem_addr = XEXP (mem_addr, 0); + + if (!MEM_P (mem_addr) + || GET_CODE (XEXP (mem_addr, 0)) != PLUS) + return; + + rtx *mem_plus_loc = &XEXP (mem_addr, 0); + rtx u_op0 = XEXP (XEXP (mem_addr, 0), 0); + rtx u_op1 = XEXP (XEXP (mem_addr, 0), 1); + + if (!(REG_P (u_op0) && CONST_INT_P (u_op1) + && REGNO (dest) == REGNO (u_op0))) + return; + + rtx new_src = plus_constant (GET_MODE (op0), op0, INTVAL (u_op1)); + validate_change (insn, &SET_SRC (set), new_src, 1); + new_src = gen_rtx_PLUS (GET_MODE (u_op0), u_op0, non_vfp_op); + validate_change (use_insn, mem_plus_loc, new_src, 1); + if (debug_insn) + validate_change (debug_insn, &XEXP(debug_expr_loc, 0), copy_rtx (new_src), + 1); + + if (apply_change_group ()) + { + rtx_insn **slot = vfp_const_iv->find_slot (insn, INSERT); + + if (*slot) + gcc_unreachable (); + else + *slot = insn; + + need_expensive_addr_check_p = true; + + /* We should update REG_DEAD info also. */ + rtx note; + if ((note = find_reg_note (insn, REG_DEAD, non_vfp_op))) + { + remove_note (insn, note); + add_shallow_copy_of_reg_note (use_insn, note); + } + + if (dump_file) + fprintf (dump_file, + "\nRe-associate insn %d and %d for later" + " RTL loop invariant hoisting.\n", + INSN_UID (insn), INSN_UID (use_insn)); + } + + return; +} + /* Finds invariants in basic block BB. ALWAYS_REACHED is true if the basic block is always executed. ALWAYS_EXECUTED is true if the basic block is always executed, unless the program ends due to a function @@ -1022,6 +1197,8 @@ find_invariants_bb (basic_block bb, bool always_reached, bool always_executed) if (!NONDEBUG_INSN_P (insn)) continue; + reshuffle_insn_with_vfp (insn); + find_invariants_insn (insn, always_reached, always_executed); if (always_reached @@ -1647,6 +1824,9 @@ move_invariants (struct loop *loop) static void init_inv_motion_data (void) { + need_expensive_addr_check_p = false; + vfp_const_iv = new hash_table <pointer_hash <rtx_insn> > (4); + actual_stamp = 1; invariants.create (100); @@ -1682,6 +1862,8 @@ free_inv_motion_data (void) free (inv); } invariants.release (); + + delete vfp_const_iv; } /* Move the invariants out of the LOOP. */