On 21/03/2023 13:42, Andrew Jenner wrote:
This patch gives GCC to use the accumulator VGPR registers on CDNA1 and later architectures. The backend does not yet attempt to make use of the matrix acceleration instructions, but the new registers are still useful as fast space for register spills. And they can now be used in inline assembly statements.

I haven't written a dedicated testcase for this - just building libgcc and libgfortran seems to have thoroughly exercised the code paths involved.

I have a test run in progress - assuming that this doesn't find any breakage, OK to commit?

gcc/ChangeLog:

     * config/gcn/constraints.md: Add AVGPR constraints.
     * config/gcn/gcn-valu.md (*mov<mode>, mov<mode>_sgprbase)
     (reload_in<mode>, reload_out<mode>): Add AVGPR alternatives.
     (gather<mode>_insn_1offset<exec>, gather<mode>_insn_1offset_ds<exec>)
     (gather<mode>_insn_2offsets<exec>)
     (scatter_store<mode>_insn_1offset<exec_scatter)
     (scatter<mode>_insn_1offset_ds<exec_scatter>)
     (scatter<mode>_insn_2offsets<exec_scatter>): Allow use of AVGPRs.
     * config/gcn/gcn.cc (MAX_NORMAL_AVGPR_COUNT): Define.
     (gcn_class_max_nregs): Handle AVGPR_REGS.
     (gcn_hard_regno_mode_ok): Likewise.
     (gcn_spill_class): Allow spilling to AVGPRs on TARGET_CDNA2_PLUS.
     (gcn_sgpr_move_p): Handle AVGPRs.
     (gcn_secondary_reload): Reload AVGPRs via VGPRs.
     (gcn_conditional_register_usage): Handle AVGPRs.
     (gcn_vgpr_equivalent_register_operand): New function.
     (gcn_valid_move_p): Check for validity of AVGPR moves.
     (gcn_memory_move_cost): Handle AVGPRs.
     (gcn_register_move_cost): Liekwise.
     (gcn_vmem_insn_p): Handle TYPE_VOP3P_MAI.
     (gcn_hsa_declare_function_name): Handle AVGPRs.
     (print_reg): Likewise.
     (gcn_dwarf_register_numbe): Likewise.
     * config/gcn/gcn.h (FIRST_AVGPR_REG, AVGPR_REGNO, LAST_AVGPR_REG):
     Define.
     (SOFT_ARG_REG, FRAME_POINTER_REGNUM, DWARF_LINK_REGISTER)
     (FIRST_PSEUDO_REGISTER): Update.
     (AVGPR_REGNO_P): Define.
     (FIXED_REGISTERS, CALL_USED_REGISTERS): Add AVGPRs.
     (enum reg_class, REG_CLASS_NAMES): Add AVGPR_REGS and ALL_VGPR_REGS.
     (REG_CLASS_CONTENTS): Add new register classes and add entries for
     AVGPRs to all classes.
     (REGISTER_NAMES): Add AVGPRs.
     * config/gcn/gcn.md (FIRST_AVGPR_REG, LAST_AVGPR_REG): Define.
     (AP_REGNUM, FP_REGNUM): Update.
     (define_attr "type"): Add vop3p_mai.
     (*mov<mode>_insn, *movti_insn): Add AVGPR alternatives.
     * gcc/config/gcn/predicates.md (gcn_avgpr_register_operand)
     (gcn_avgpr_hard_register_operand): New predicates.

I don't like the "a" and "b" constraints. It feels error prone and we don't use that sort of conditional on any other constraint.

Please global replace the "b" constraint with "a", and then set the "gcn_version" attribute to "cdna1only" and "cdna2" on each alternative of each insn that previously used "a" and "b". I think you'll need an extra alternative for the load/store insns.

@@ -801,7 +804,7 @@ gcn_spill_class (reg_class_t c, machine_mode /*mode */ )
       || c == VCC_CONDITIONAL_REG || c == EXEC_MASK_REG)
     return SGPR_REGS;
   else
-    return NO_REGS;
+    return c == VGPR_REGS && TARGET_CDNA2_PLUS ? AVGPR_REGS : NO_REGS;
 }

Shouldn't that be CDNA1?

@@ -2524,6 +2539,16 @@ gcn_conditional_register_usage (void)
     fixed_regs[cfun->machine->args.reg[WORK_ITEM_ID_Z_ARG]] = 1;
 }
+static bool
+gcn_vgpr_equivalent_register_operand (rtx x, machine_mode mode)
+{

Comment before the function please.

@@ -316,6 +354,8 @@ enum reg_class
   SGPR_SRC_REGS,
   GENERAL_REGS,
   VGPR_REGS,
+  AVGPR_REGS,
+  ALL_VGPR_REGS,
   ALL_GPR_REGS,
   SRCDST_REGS,
   AFP_REGS,

What is ALL_VGPR_REGS for?

@@ -530,9 +538,9 @@
(define_insn "*mov<mode>_insn"
   [(set (match_operand:SISF 0 "nonimmediate_operand"
-                 "=SD,SD,SD,SD,RB,Sm,RS,v,Sg, v, v,RF,v,RLRG,   v,SD, v,RM")
+                 "=SD,SD,SD,SD,RB,Sm,RS,v,Sg, v,vb,RF,v,RLRG,   v,SD,vb,RM, v, a, 
b")
        (match_operand:SISF 1 "gcn_load_operand"
-                 "SSA, J, B,RB,Sm,RS,Sm,v, v,Sv,RF, v,B,   v,RLRG, Y,RM, v"))]
+                 "SSA, J, B,RB,Sm,RS,Sm,v, v,Sv,RF,vb,B,   v,RLRG, Y,RM,vb,^a, v, 
b"))]

These lines are now too long. In this backend we have adopted a style that has the constraints right justified such that the longest line has the last character in the right-most column, and the shorter lines have the alternatives aligned.

There are other similar long lines further down the patch.

@@ -580,19 +591,22 @@
   ds_write%b0\t%A0, %1%O0\;s_waitcnt\tlgkmcnt(0)
   ds_read%u1\t%0, %A1%O1\;s_waitcnt\tlgkmcnt(0)
   global_load%o1\t%0, %A1%O1%g1\;s_waitcnt\tvmcnt(0)
-  global_store%s0\t%A0, %1%O0%g0"
+  global_store%s0\t%A0, %1%O0%g0
+  v_accvgpr_read_b32\t%0, %1
+  v_accvgpr_write_b32\t%0, %1
+  v_accvgpr_mov_b32\t%0, %1"
   [(set_attr "type"
-            "sop1,sopk,sop1,vop1,vop3a,vop3a,flat,flat,vop1,ds,ds,flat,flat")
-   (set_attr "exec" "*,*,*,*,none,none,*,*,*,*,*,*,*")
-   (set_attr "length" "4,4,8,4,4,4,12,12,8,12,12,12,12")])
+            "sop1,sopk,sop1,vop1,vop3a,vop3a,flat,flat,vop1,ds,ds,flat,flat, 
vop3p_mai,vop3p_mai,vop1")
+   (set_attr "exec" "*,*,*,*,none,none,*,*,*,*,*,*,*,*,*,*")
+   (set_attr "length" "4,4,8,4,4,4,12,12,8,12,12,12,12,8,8,4")])

You can break the long attribute strings after a comma.

Andrew

Reply via email to