Thanks for your review.

First of all, this is an optimization.
   gcc do sms before reload, and here each insn use pseudo-register. After 
reload, they are allocated hard-register, then the regrename pass try to adjust 
the register number with def/use chain created by build_def_use.
 As now gcc doesn't consider the VLIW bundles, so regrename pass may updated a 
reg which may not really unused, which will bring in invalid VLIW bundles. 
   Before the final schedule, we usually recheck the validation of VLIW 
bundles, and reschedule the conflicted insns into two VLIW to make them 
validation to avoid above issue, so this is not a correctness issue. 
 Certainly, reschedule the conflicted insns into two VLIW will destroy the 
kernel loop's sms schedule result, and usually it will be harmful to the 
performance.


-----邮件原件-----
发件人: Richard Sandiford [mailto:richard.sandif...@arm.com] 
发送时间: 2020年7月21日 0:05
收件人: Zhongyunde <zhongyu...@huawei.com>
抄送: gcc-patches@gcc.gnu.org; Yangfei (A) <lancelo.y...@huawei.com>
主题: Re: [PATCH PR95696] regrename creates overlapping register allocations for 
vliw

Hi,

Zhongyunde <zhongyu...@huawei.com> writes:
> Hi,
>
> In most target, it is limited to issue two insns with change the same 
> register. So a register is not realy unused if there is another insn, which 
> set the register in the save VLIW.
>
> For example, The insn 73 start with insn:TI, so it will be issued together 
> with others insns until a new insn start with insn:TI, here is insn 243.
>
> The regrename pass known the mode V2VF in insn 73 need two successive 
> registers, i.e. v2 and v3, here is dump snippet before the regrename:
>
>
>
> ======================================================================
> ====================
>
> (insn:TI 73 76 71 4 (set (reg/v:V2VF 37 v2 [orig:180 _62 ] [180])
>
>         (unspec:V2VF [
>
>                 (reg/v:VHF 43 v8 [orig:210 Dest_value ] [210])
>
>                 (reg/v:VHF 43 v8 [orig:210 Dest_value ] [210])
>
>             ] UNSPEC_HFSQMAG_32X32)) "../test_modify.c":57 710 
> {hfsqmag_v2vf}
>
>      (expr_list:REG_DEAD (reg/v:VHF 43 v8 [orig:210 Dest_value ] 
> [210])
>
>         (expr_list:REG_UNUSED (reg:VHF 38 v3)
>
>             (expr_list:REG_STAGE (const_int 2 [0x2])
>
>                 (expr_list:REG_CYCLE (const_int 2 [0x2])
>
>                     (expr_list:REG_UNITS (const_int 256 [0x100])
>
>                         (nil)))))))
>
>
>
> (insn 71 73 243 4 (set (reg:VHF 43 v8 [orig:265 MEM[(const vfloat32x16 
> *)Src_base_134] ] [265])
>
>         (mem:VHF (reg/v/f:DI 13 a13 [orig:207 Src_base ] [207]) [1 
> MEM[(const vfloat32x16 *)Src_base_134]+0 S64 A512])) 
> "../test_modify.c":56 450 {movvhf_internal}
>
>      (expr_list:REG_STAGE (const_int 1 [0x1])
>
>         (expr_list:REG_CYCLE (const_int 2 [0x2])
>
>             (nil))))
>
>
>
> (insn:TI 243 …
>
>
>
> Then, in the regrename, the insn 71 will be transformed as following code 
> with register v3, and there is an conflict between insn 73 and insn 71 (as 
> both of them set the v3 register).
>
>
>
> Register v2 (2): 73 [SVEC_REGS]
>
> Register v8 (1): 71 [VEC_ALL_REGS]
>
>
>
> ....
>
>
>
> (insn 71 73 243 4 (set (reg:VHF 38 v3 [orig:265 MEM[(const vfloat32x16 
> *)Src_base_134] ] [265])
>
>         (mem:VHF (reg/v/f:DI 13 a13 [orig:207 Src_base ] [207]) [1 
> MEM[(const vfloat32x16 *)Src_base_134]+0 S64 A512])) 
> "../test_modify.c":56 450 {movvhf_internal}
>
>      (expr_list:REG_STAGE (const_int 1 [0x1])
>
>         (expr_list:REG_CYCLE (const_int 2 [0x2])

Do you need this for correctness, or is it “just” an optimisation?

The reason for asking is that regrename runs quite a long time after the first 
scheduling pass, with things like register allocation happening in between.  
The :TI markers on the instructions are therefore going to be very stale by the 
time that regrename runs.

If it's just a heuristic then it might be OK to use them anyway, but it would 
be worth having a comment to say what's going on.

One alternative would be to run the DFA over the instructions to recompute the 
VLIW bundles, but that isn't simple.  It also isn't necessarily a better 
heuristic, since the second scheduling pass is likely to change things anyway.

There's also the problem that :TI markers are used on non-VLIW targets too, 
whereas the problem is really specific to VLIW targets.

Unfortunately, I can't think of any good counter-suggestions at the moment…

Thanks,
Richard

>
> diff --git a/gcc/regrename.c b/gcc/regrename.c index 
> c38173a77..e54794413 100644
> --- a/gcc/regrename.c
> +++ b/gcc/regrename.c
> @@ -1614,12 +1614,26 @@ record_out_operands (rtx_insn *insn, bool 
> earlyclobber, insn_rr_info *insn_info)
>    cur_operand = NULL;
>  }
>  
> +/* Get the first real insn of next vliw in current BB.  */ static 
> +rtx_insn * get_next_vliw_first_insn (rtx_insn *cur_insn, basic_block 
> +bb) {
> +  rtx_insn *insn = next_real_insn (cur_insn);
> +
> +  for (; insn && insn != BB_END (bb); insn = next_real_insn (insn))
> +    if (GET_MODE (insn) == TImode)
> +      return insn;
> +
> +  return cur_insn;
> +}
> +
>  /* Build def/use chain.  */
>  
>  static bool
>  build_def_use (basic_block bb)
>  {
>    rtx_insn *insn;
> +  rtx_insn *vliw_start_insn = NULL;
>    unsigned HOST_WIDE_INT untracked_operands;
>  
>    fail_current_block = false;
> @@ -1663,6 +1677,9 @@ build_def_use (basic_block bb)
>            to be marked unrenamable or even cause us to abort the entire
>            basic block.  */
>  
> +       if (GET_MODE (insn) == TImode)
> +         vliw_start_insn = insn;
> +
>         extract_constrain_insn (insn);
>         preprocess_constraints (insn);
>         const operand_alternative *op_alt = which_op_alt (); @@ -1858,17 
> +1875,26 @@ build_def_use (basic_block bb)
>             scan_rtx (insn, &XEXP (note, 0), ALL_REGS, mark_access,
>                       OP_INOUT);
>  
> -       /* Step 7: Close chains for registers that were never
> -          really used here.  */
> -       for (note = REG_NOTES (insn); note; note = XEXP (note, 1))
> -         if (REG_NOTE_KIND (note) == REG_UNUSED)
> -           {
> -             remove_from_hard_reg_set (&live_hard_regs,
> -                                       GET_MODE (XEXP (note, 0)),
> -                                       REGNO (XEXP (note, 0)));
> -             scan_rtx (insn, &XEXP (note, 0), NO_REGS, terminate_dead,
> -                       OP_IN);
> -           }
> +       /* Step 7: Close chains for registers that were never
> +          really used delayed at the end of vliw.  */
> +       if (vliw_start_insn
> +           && next_real_insn (insn) == get_next_vliw_first_insn (insn, bb))
> +         {
> +           rtx_insn *member;
> +
> +           for (member = vliw_start_insn;
> +                member != next_active_insn (insn);
> +                member = next_active_insn (member))
> +             for (note = REG_NOTES (member); note; note = XEXP (note, 1))
> +               if (REG_NOTE_KIND (note) == REG_UNUSED)
> +                 {
> +                   remove_from_hard_reg_set (&live_hard_regs,
> +                                             GET_MODE (XEXP (note, 0)),
> +                                             REGNO (XEXP (note, 0)));
> +                   scan_rtx (member, &XEXP (note, 0), NO_REGS, 
> terminate_dead,
> +                             OP_IN);
> +                 }
> +         }
>  
>         /* Step 8: Kill the chains involving register restores.  Those
>            should restore _that_ register.  */

Reply via email to