On 21/03/2023 13:42, Andrew Jenner wrote:
This patch gives GCC to use the accumulator VGPR registers on CDNA1 and
later architectures. The backend does not yet attempt to make use of the
matrix acceleration instructions, but the new registers are still useful
as fast space for register spills. And they can now be used in inline
assembly statements.
I haven't written a dedicated testcase for this - just building libgcc
and libgfortran seems to have thoroughly exercised the code paths involved.
I have a test run in progress - assuming that this doesn't find any
breakage, OK to commit?
gcc/ChangeLog:
* config/gcn/constraints.md: Add AVGPR constraints.
* config/gcn/gcn-valu.md (*mov<mode>, mov<mode>_sgprbase)
(reload_in<mode>, reload_out<mode>): Add AVGPR alternatives.
(gather<mode>_insn_1offset<exec>, gather<mode>_insn_1offset_ds<exec>)
(gather<mode>_insn_2offsets<exec>)
(scatter_store<mode>_insn_1offset<exec_scatter)
(scatter<mode>_insn_1offset_ds<exec_scatter>)
(scatter<mode>_insn_2offsets<exec_scatter>): Allow use of AVGPRs.
* config/gcn/gcn.cc (MAX_NORMAL_AVGPR_COUNT): Define.
(gcn_class_max_nregs): Handle AVGPR_REGS.
(gcn_hard_regno_mode_ok): Likewise.
(gcn_spill_class): Allow spilling to AVGPRs on TARGET_CDNA2_PLUS.
(gcn_sgpr_move_p): Handle AVGPRs.
(gcn_secondary_reload): Reload AVGPRs via VGPRs.
(gcn_conditional_register_usage): Handle AVGPRs.
(gcn_vgpr_equivalent_register_operand): New function.
(gcn_valid_move_p): Check for validity of AVGPR moves.
(gcn_memory_move_cost): Handle AVGPRs.
(gcn_register_move_cost): Liekwise.
(gcn_vmem_insn_p): Handle TYPE_VOP3P_MAI.
(gcn_hsa_declare_function_name): Handle AVGPRs.
(print_reg): Likewise.
(gcn_dwarf_register_numbe): Likewise.
* config/gcn/gcn.h (FIRST_AVGPR_REG, AVGPR_REGNO, LAST_AVGPR_REG):
Define.
(SOFT_ARG_REG, FRAME_POINTER_REGNUM, DWARF_LINK_REGISTER)
(FIRST_PSEUDO_REGISTER): Update.
(AVGPR_REGNO_P): Define.
(FIXED_REGISTERS, CALL_USED_REGISTERS): Add AVGPRs.
(enum reg_class, REG_CLASS_NAMES): Add AVGPR_REGS and ALL_VGPR_REGS.
(REG_CLASS_CONTENTS): Add new register classes and add entries for
AVGPRs to all classes.
(REGISTER_NAMES): Add AVGPRs.
* config/gcn/gcn.md (FIRST_AVGPR_REG, LAST_AVGPR_REG): Define.
(AP_REGNUM, FP_REGNUM): Update.
(define_attr "type"): Add vop3p_mai.
(*mov<mode>_insn, *movti_insn): Add AVGPR alternatives.
* gcc/config/gcn/predicates.md (gcn_avgpr_register_operand)
(gcn_avgpr_hard_register_operand): New predicates.
I don't like the "a" and "b" constraints. It feels error prone and we
don't use that sort of conditional on any other constraint.
Please global replace the "b" constraint with "a", and then set the
"gcn_version" attribute to "cdna1only" and "cdna2" on each alternative
of each insn that previously used "a" and "b". I think you'll need an
extra alternative for the load/store insns.
@@ -801,7 +804,7 @@ gcn_spill_class (reg_class_t c, machine_mode /*mode */ )
|| c == VCC_CONDITIONAL_REG || c == EXEC_MASK_REG)
return SGPR_REGS;
else
- return NO_REGS;
+ return c == VGPR_REGS && TARGET_CDNA2_PLUS ? AVGPR_REGS : NO_REGS;
}
Shouldn't that be CDNA1?
@@ -2524,6 +2539,16 @@ gcn_conditional_register_usage (void)
fixed_regs[cfun->machine->args.reg[WORK_ITEM_ID_Z_ARG]] = 1;
}
+static bool
+gcn_vgpr_equivalent_register_operand (rtx x, machine_mode mode)
+{
Comment before the function please.
@@ -316,6 +354,8 @@ enum reg_class
SGPR_SRC_REGS,
GENERAL_REGS,
VGPR_REGS,
+ AVGPR_REGS,
+ ALL_VGPR_REGS,
ALL_GPR_REGS,
SRCDST_REGS,
AFP_REGS,
What is ALL_VGPR_REGS for?
@@ -530,9 +538,9 @@
(define_insn "*mov<mode>_insn"
[(set (match_operand:SISF 0 "nonimmediate_operand"
- "=SD,SD,SD,SD,RB,Sm,RS,v,Sg, v, v,RF,v,RLRG, v,SD, v,RM")
+ "=SD,SD,SD,SD,RB,Sm,RS,v,Sg, v,vb,RF,v,RLRG, v,SD,vb,RM, v, a,
b")
(match_operand:SISF 1 "gcn_load_operand"
- "SSA, J, B,RB,Sm,RS,Sm,v, v,Sv,RF, v,B, v,RLRG, Y,RM, v"))]
+ "SSA, J, B,RB,Sm,RS,Sm,v, v,Sv,RF,vb,B, v,RLRG, Y,RM,vb,^a, v,
b"))]
These lines are now too long. In this backend we have adopted a style
that has the constraints right justified such that the longest line has
the last character in the right-most column, and the shorter lines have
the alternatives aligned.
There are other similar long lines further down the patch.
@@ -580,19 +591,22 @@
ds_write%b0\t%A0, %1%O0\;s_waitcnt\tlgkmcnt(0)
ds_read%u1\t%0, %A1%O1\;s_waitcnt\tlgkmcnt(0)
global_load%o1\t%0, %A1%O1%g1\;s_waitcnt\tvmcnt(0)
- global_store%s0\t%A0, %1%O0%g0"
+ global_store%s0\t%A0, %1%O0%g0
+ v_accvgpr_read_b32\t%0, %1
+ v_accvgpr_write_b32\t%0, %1
+ v_accvgpr_mov_b32\t%0, %1"
[(set_attr "type"
- "sop1,sopk,sop1,vop1,vop3a,vop3a,flat,flat,vop1,ds,ds,flat,flat")
- (set_attr "exec" "*,*,*,*,none,none,*,*,*,*,*,*,*")
- (set_attr "length" "4,4,8,4,4,4,12,12,8,12,12,12,12")])
+ "sop1,sopk,sop1,vop1,vop3a,vop3a,flat,flat,vop1,ds,ds,flat,flat,
vop3p_mai,vop3p_mai,vop1")
+ (set_attr "exec" "*,*,*,*,none,none,*,*,*,*,*,*,*,*,*,*")
+ (set_attr "length" "4,4,8,4,4,4,12,12,8,12,12,12,12,8,8,4")])
You can break the long attribute strings after a comma.
Andrew