llvmbot wrote:
<!--LLVM PR SUMMARY COMMENT--> @llvm/pr-subscribers-llvm-globalisel Author: Dhruva Chakrabarti (dhruvachak) <details> <summary>Changes</summary> The LIT tests have been generally updated in one of the following ways: (1) If the above option was not present and the test was auto-generated, the test has now been auto-generated. (2) If the above option was not present and the test was not auto-generated, added the option -amdgpu-use-amdgpu-trackers=0 so as to preserve any specific attributes the test was already checking. (3) If the above option was present in a test, then its value has been updated to reflect the change in the default. Currently, there are 4 tests in category (2). They are: CodeGen/AMDGPU/ addrspacecast.ll schedule-regpressure-limit.ll schedule-regpressure-limit2.ll sema-v-unsched-bundle.ll There are 8 tests in category (3). They are: CodeGen/AMDGPU/ schedule-amdgpu-tracker-physreg.ll schedule-amdgpu-trackers.ll materialize-frame-index-sgpr.ll schedule-relaxed-occupancy.ll schedule-regpressure-ilp-metric-spills.mir pr51516.mir high-RP-reschedule.mir machine-scheduler-sink-trivial-remats.mir The rest are in category (1). This PR is stacked on top of https://github.com/llvm/llvm-project/pull/184275. Assisted-by: Cursor --- Patch is 21.25 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/184400.diff 162 Files Affected: - (modified) llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp (+1-1) - (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/add.vni16.ll (+105-105) - (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/combine-fma-add-fma-mul.ll (+52-52) - (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/fdiv.f64.ll (+369-387) - (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll (+223-223) - (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/fshr.ll (+217-217) - (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/insertelement-stack-lower.ll (+80-80) - (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/insertelement.ll (+138-138) - (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/load-uniform-in-vgpr.ll (+26-27) - (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/mul.ll (+198-183) - (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/sdiv.i64.ll (+759-759) - (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/sdivrem.ll (+196-196) - (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/srem.i64.ll (+987-987) - (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/uaddsat.ll (+2-2) - (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/udiv.i64.ll (+219-219) - (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/udivrem.ll (+243-243) - (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/urem.i64.ll (+220-220) - (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/usubsat.ll (+24-24) - (modified) llvm/test/CodeGen/AMDGPU/a-v-flat-atomicrmw.ll (+1-1) - (modified) llvm/test/CodeGen/AMDGPU/a-v-global-atomicrmw.ll (+1-1) - (modified) llvm/test/CodeGen/AMDGPU/abs_i16.ll (+15-15) - (modified) llvm/test/CodeGen/AMDGPU/add.ll (+32-32) - (modified) llvm/test/CodeGen/AMDGPU/addrspacecast.ll (+2-2) - (modified) llvm/test/CodeGen/AMDGPU/agpr-copy-no-free-registers.ll (+138-58) - (modified) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.1024bit.ll (+73457-73030) - (modified) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.128bit.ll (+95-97) - (modified) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.224bit.ll (+43-43) - (modified) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.256bit.ll (+3794-3764) - (modified) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.288bit.ll (+192-192) - (modified) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.320bit.ll (+1299-1275) - (modified) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.352bit.ll (+229-229) - (modified) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.384bit.ll (+481-469) - (modified) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.448bit.ll (+568-536) - (modified) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.512bit.ll (+12941-12966) - (modified) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.576bit.ll (+1593-1561) - (modified) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.640bit.ll (+1690-1649) - (modified) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.704bit.ll (+2710-2648) - (modified) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.768bit.ll (+1904-1808) - (modified) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.832bit.ll (+2318-2175) - (modified) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.896bit.ll (+3632-3520) - (modified) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.960bit.ll (+4379-4290) - (modified) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.96bit.ll (+28-28) - (modified) llvm/test/CodeGen/AMDGPU/av-split-dead-valno-crash.ll (+26-28) - (modified) llvm/test/CodeGen/AMDGPU/bf16.ll (+9707-9179) - (modified) llvm/test/CodeGen/AMDGPU/buffer-fat-pointers-contents-legalization.ll (+61-63) - (modified) llvm/test/CodeGen/AMDGPU/buffer-fat-pointers-memcpy.ll (+36-36) - (modified) llvm/test/CodeGen/AMDGPU/call-argument-types.ll (+706-417) - (modified) llvm/test/CodeGen/AMDGPU/debug-value-scheduler-crash.mir (+21-21) - (modified) llvm/test/CodeGen/AMDGPU/div_i128.ll (+271-271) - (modified) llvm/test/CodeGen/AMDGPU/div_v2i128.ll (+933-933) - (modified) llvm/test/CodeGen/AMDGPU/extract-subvector.ll (+28-28) - (modified) llvm/test/CodeGen/AMDGPU/fcanonicalize.bf16.ll (+72-73) - (modified) llvm/test/CodeGen/AMDGPU/fcanonicalize.f16.ll (+4-4) - (modified) llvm/test/CodeGen/AMDGPU/fceil64.ll (+311-313) - (modified) llvm/test/CodeGen/AMDGPU/fcopysign.bf16.ll (+155-176) - (modified) llvm/test/CodeGen/AMDGPU/fcopysign.f16.ll (+82-105) - (modified) llvm/test/CodeGen/AMDGPU/fmax_legacy.f16.ll (+36-36) - (modified) llvm/test/CodeGen/AMDGPU/fmaximum.ll (+43-43) - (modified) llvm/test/CodeGen/AMDGPU/fmin_legacy.f16.ll (+36-36) - (modified) llvm/test/CodeGen/AMDGPU/fminimum.ll (+43-43) - (modified) llvm/test/CodeGen/AMDGPU/fptoi.i128.ll (+62-62) - (modified) llvm/test/CodeGen/AMDGPU/freeze.ll (+86-79) - (modified) llvm/test/CodeGen/AMDGPU/fsqrt.f64.ll (+200-200) - (modified) llvm/test/CodeGen/AMDGPU/function-args.ll (+43-43) - (modified) llvm/test/CodeGen/AMDGPU/function-returns.ll (+229-229) - (modified) llvm/test/CodeGen/AMDGPU/gfx-callable-argument-types.ll (+59-63) - (modified) llvm/test/CodeGen/AMDGPU/gfx-callable-return-types.ll (+526-327) - (modified) llvm/test/CodeGen/AMDGPU/half.ll (+233-231) - (modified) llvm/test/CodeGen/AMDGPU/high-RP-reschedule.mir (+2-2) - (modified) llvm/test/CodeGen/AMDGPU/identical-subrange-spill-infloop.ll (+56-89) - (modified) llvm/test/CodeGen/AMDGPU/indirect-addressing-si.ll (+480-485) - (modified) llvm/test/CodeGen/AMDGPU/insert_vector_dynelt.ll (+67-68) - (modified) llvm/test/CodeGen/AMDGPU/insert_vector_elt.v2bf16.ll (+140-140) - (modified) llvm/test/CodeGen/AMDGPU/insert_vector_elt.v2i16.ll (+138-138) - (modified) llvm/test/CodeGen/AMDGPU/insert_waitcnt_for_precise_memory.ll (+2-2) - (modified) llvm/test/CodeGen/AMDGPU/integer-mad-patterns.ll (+114-114) - (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.iglp.opt.exp.large.mir (+808-807) - (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.iglp.opt.exp.small.mir (+361-361) - (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.iglp.opt.single.2c.mir (+6-6) - (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.mfma.ll (+10-12) - (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.tensor.load.store.ll (+91-180) - (modified) llvm/test/CodeGen/AMDGPU/llvm.exp.f64.ll (+1576-1559) - (modified) llvm/test/CodeGen/AMDGPU/llvm.exp10.f64.ll (+1564-1558) - (modified) llvm/test/CodeGen/AMDGPU/llvm.exp2.f64.ll (+1333-1327) - (modified) llvm/test/CodeGen/AMDGPU/llvm.fma.f16.ll (+6-6) - (modified) llvm/test/CodeGen/AMDGPU/llvm.maximum.f16.ll (+188-188) - (modified) llvm/test/CodeGen/AMDGPU/llvm.maximum.f32.ll (+147-147) - (modified) llvm/test/CodeGen/AMDGPU/llvm.maximum.f64.ll (+471-372) - (modified) llvm/test/CodeGen/AMDGPU/llvm.minimum.f16.ll (+73-73) - (modified) llvm/test/CodeGen/AMDGPU/llvm.minimum.f32.ll (+147-147) - (modified) llvm/test/CodeGen/AMDGPU/llvm.minimum.f64.ll (+471-372) - (modified) llvm/test/CodeGen/AMDGPU/llvm.round.f64.ll (+39-40) - (modified) llvm/test/CodeGen/AMDGPU/load-constant-i1.ll (+2122-1778) - (modified) llvm/test/CodeGen/AMDGPU/load-constant-i16.ll (+1556-1566) - (modified) llvm/test/CodeGen/AMDGPU/load-constant-i32.ll (+472-466) - (modified) llvm/test/CodeGen/AMDGPU/load-constant-i64.ll (+48-47) - (modified) llvm/test/CodeGen/AMDGPU/load-constant-i8.ll (+1171-1178) - (modified) llvm/test/CodeGen/AMDGPU/load-global-i16.ll (+1655-1757) - (modified) llvm/test/CodeGen/AMDGPU/load-global-i32.ll (+649-789) - (modified) llvm/test/CodeGen/AMDGPU/load-global-i8.ll (+1592-1665) - (modified) llvm/test/CodeGen/AMDGPU/load-local-i16.ll (+2691-2901) - (modified) llvm/test/CodeGen/AMDGPU/machine-scheduler-rematerialization-scoring.mir (+1-1) - (modified) llvm/test/CodeGen/AMDGPU/machine-scheduler-sink-trivial-remats.mir (+2-2) - (modified) llvm/test/CodeGen/AMDGPU/materialize-frame-index-sgpr.ll (+8-8) - (modified) llvm/test/CodeGen/AMDGPU/maximumnum.bf16.ll (+2352-2152) - (modified) llvm/test/CodeGen/AMDGPU/maximumnum.ll (+506-535) - (modified) llvm/test/CodeGen/AMDGPU/memcpy-libcall.ll (+92-90) - (modified) llvm/test/CodeGen/AMDGPU/memintrinsic-unroll.ll (+1274-1271) - (modified) llvm/test/CodeGen/AMDGPU/memset-param-combinations.ll (+7-11) - (modified) llvm/test/CodeGen/AMDGPU/mfma-cd-select.ll (+30-36) - (modified) llvm/test/CodeGen/AMDGPU/mfma-no-register-aliasing.ll (+20-20) - (modified) llvm/test/CodeGen/AMDGPU/minimumnum.bf16.ll (+2386-2186) - (modified) llvm/test/CodeGen/AMDGPU/minimumnum.ll (+506-535) - (modified) llvm/test/CodeGen/AMDGPU/mul.ll (+39-39) - (modified) llvm/test/CodeGen/AMDGPU/mul24-pass-ordering.ll (+8-8) - (modified) llvm/test/CodeGen/AMDGPU/packed-fp32.ll (+364-356) - (modified) llvm/test/CodeGen/AMDGPU/pr51516.mir (+1-1) - (modified) llvm/test/CodeGen/AMDGPU/preserve-wwm-copy-dst-reg.ll (+16-16) - (modified) llvm/test/CodeGen/AMDGPU/promote-constOffset-to-imm.ll (+104-103) - (modified) llvm/test/CodeGen/AMDGPU/regpressure_printer.mir (+64-50) - (modified) llvm/test/CodeGen/AMDGPU/rem_i128.ll (+175-175) - (modified) llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr.ll (+15-18) - (modified) llvm/test/CodeGen/AMDGPU/rsq.f64.ll (+782-785) - (modified) llvm/test/CodeGen/AMDGPU/sched-assert-dead-def-subreg-use-other-subreg.mir (+10-10) - (modified) llvm/test/CodeGen/AMDGPU/sched-handleMoveUp-subreg-def-across-subreg-def.mir (+2-2) - (modified) llvm/test/CodeGen/AMDGPU/sched_mfma_rewrite_copies.mir (+780-780) - (modified) llvm/test/CodeGen/AMDGPU/sched_mfma_rewrite_cost.mir (+62-62) - (modified) llvm/test/CodeGen/AMDGPU/sched_mfma_rewrite_diff_types.mir (+20-20) - (modified) llvm/test/CodeGen/AMDGPU/schedule-amdgpu-tracker-physreg.ll (+2-2) - (modified) llvm/test/CodeGen/AMDGPU/schedule-amdgpu-trackers.ll (+4-4) - (modified) llvm/test/CodeGen/AMDGPU/schedule-barrier.mir (+13-13) - (modified) llvm/test/CodeGen/AMDGPU/schedule-regpressure-ilp-metric-spills.mir (+1-1) - (modified) llvm/test/CodeGen/AMDGPU/schedule-regpressure-limit.ll (+3-3) - (modified) llvm/test/CodeGen/AMDGPU/schedule-regpressure-limit2.ll (+8-8) - (modified) llvm/test/CodeGen/AMDGPU/schedule-relaxed-occupancy.ll (+4-4) - (modified) llvm/test/CodeGen/AMDGPU/scratch-simple.ll (+452-438) - (modified) llvm/test/CodeGen/AMDGPU/sdiv.ll (+210-210) - (modified) llvm/test/CodeGen/AMDGPU/sdwa-peephole.ll (+7-7) - (modified) llvm/test/CodeGen/AMDGPU/select.f16.ll (+476-512) - (modified) llvm/test/CodeGen/AMDGPU/sema-v-unsched-bundle.ll (+1-1) - (modified) llvm/test/CodeGen/AMDGPU/shl.ll (+11-11) - (modified) llvm/test/CodeGen/AMDGPU/shufflevector.v2i64.v8i64.ll (+459-459) - (modified) llvm/test/CodeGen/AMDGPU/shufflevector.v4i64.v4i64.ll (+119-129) - (modified) llvm/test/CodeGen/AMDGPU/shufflevector.v4p0.v4p0.ll (+119-129) - (modified) llvm/test/CodeGen/AMDGPU/spill-agpr.ll (+1-1) - (modified) llvm/test/CodeGen/AMDGPU/sra.ll (+31-31) - (modified) llvm/test/CodeGen/AMDGPU/srem.ll (+113-113) - (modified) llvm/test/CodeGen/AMDGPU/srl.ll (+11-11) - (modified) llvm/test/CodeGen/AMDGPU/ssubsat.ll (+96-96) - (modified) llvm/test/CodeGen/AMDGPU/stack-realign.ll (+4-8) - (modified) llvm/test/CodeGen/AMDGPU/uaddsat.ll (+2-2) - (modified) llvm/test/CodeGen/AMDGPU/udiv.ll (+20-20) - (modified) llvm/test/CodeGen/AMDGPU/vector-reduce-add.ll (+8-8) - (modified) llvm/test/CodeGen/AMDGPU/vector-reduce-and.ll (+8-8) - (modified) llvm/test/CodeGen/AMDGPU/vector-reduce-mul.ll (+332-318) - (modified) llvm/test/CodeGen/AMDGPU/vector-reduce-or.ll (+8-8) - (modified) llvm/test/CodeGen/AMDGPU/vector-reduce-smax.ll (+82-82) - (modified) llvm/test/CodeGen/AMDGPU/vector-reduce-smin.ll (+82-82) - (modified) llvm/test/CodeGen/AMDGPU/vector-reduce-umax.ll (+82-82) - (modified) llvm/test/CodeGen/AMDGPU/vector-reduce-umin.ll (+82-82) - (modified) llvm/test/CodeGen/AMDGPU/vector-reduce-xor.ll (+8-8) - (modified) llvm/test/CodeGen/AMDGPU/vni8-across-blocks.ll (+36-35) ``````````diff diff --git a/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp b/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp index 6685df3de7d22..127acf1c5513b 100644 --- a/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp +++ b/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp @@ -76,7 +76,7 @@ static cl::opt<bool> static cl::opt<bool> GCNTrackers( "amdgpu-use-amdgpu-trackers", cl::Hidden, cl::desc("Use the AMDGPU specific RPTrackers during scheduling"), - cl::init(false)); + cl::init(true)); static cl::opt<bool> TrackPhysRegInTrackers( "amdgpu-trackers-physical-register-tracking", cl::Hidden, diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/add.vni16.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/add.vni16.ll index b754bf0071da8..c7375768a831e 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/add.vni16.ll +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/add.vni16.ll @@ -370,62 +370,62 @@ define void @addv_7i16(ptr addrspace(1) %ptra, ptr addrspace(1) %ptrb, ptr addrs ; GFX8-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc ; GFX8-NEXT: flat_load_ushort v17, v[6:7] ; GFX8-NEXT: flat_load_ushort v18, v[8:9] -; GFX8-NEXT: flat_load_ushort v19, v[10:11] -; GFX8-NEXT: flat_load_ushort v20, v[12:13] -; GFX8-NEXT: flat_load_ushort v21, v[14:15] -; GFX8-NEXT: flat_load_ushort v22, v[0:1] +; GFX8-NEXT: flat_load_ushort v10, v[10:11] +; GFX8-NEXT: flat_load_ushort v11, v[12:13] +; GFX8-NEXT: flat_load_ushort v12, v[14:15] +; GFX8-NEXT: flat_load_ushort v13, v[0:1] ; GFX8-NEXT: v_add_u32_e32 v0, vcc, 2, v2 ; GFX8-NEXT: v_addc_u32_e32 v1, vcc, 0, v3, vcc ; GFX8-NEXT: v_add_u32_e32 v6, vcc, 4, v2 ; GFX8-NEXT: v_addc_u32_e32 v7, vcc, 0, v3, vcc -; GFX8-NEXT: v_add_u32_e32 v8, vcc, 6, v2 +; GFX8-NEXT: flat_load_ushort v14, v[2:3] +; GFX8-NEXT: flat_load_ushort v15, v[0:1] +; GFX8-NEXT: flat_load_ushort v19, v[6:7] +; GFX8-NEXT: v_add_u32_e32 v0, vcc, 6, v2 +; GFX8-NEXT: v_addc_u32_e32 v1, vcc, 0, v3, vcc +; GFX8-NEXT: v_add_u32_e32 v6, vcc, 8, v2 +; GFX8-NEXT: v_addc_u32_e32 v7, vcc, 0, v3, vcc +; GFX8-NEXT: v_add_u32_e32 v8, vcc, 10, v2 ; GFX8-NEXT: v_addc_u32_e32 v9, vcc, 0, v3, vcc -; GFX8-NEXT: v_add_u32_e32 v10, vcc, 8, v2 -; GFX8-NEXT: v_addc_u32_e32 v11, vcc, 0, v3, vcc -; GFX8-NEXT: v_add_u32_e32 v12, vcc, 10, v2 -; GFX8-NEXT: v_addc_u32_e32 v13, vcc, 0, v3, vcc -; GFX8-NEXT: v_add_u32_e32 v14, vcc, 12, v2 -; GFX8-NEXT: v_addc_u32_e32 v15, vcc, 0, v3, vcc -; GFX8-NEXT: flat_load_ushort v2, v[2:3] -; GFX8-NEXT: flat_load_ushort v3, v[0:1] +; GFX8-NEXT: v_add_u32_e32 v2, vcc, 12, v2 +; GFX8-NEXT: v_addc_u32_e32 v3, vcc, 0, v3, vcc +; GFX8-NEXT: flat_load_ushort v20, v[0:1] ; GFX8-NEXT: flat_load_ushort v6, v[6:7] ; GFX8-NEXT: flat_load_ushort v7, v[8:9] -; GFX8-NEXT: flat_load_ushort v8, v[10:11] -; GFX8-NEXT: flat_load_ushort v9, v[12:13] -; GFX8-NEXT: flat_load_ushort v10, v[14:15] +; GFX8-NEXT: flat_load_ushort v2, v[2:3] ; GFX8-NEXT: v_add_u32_e32 v0, vcc, 2, v4 ; GFX8-NEXT: v_addc_u32_e32 v1, vcc, 0, v5, vcc ; GFX8-NEXT: s_waitcnt vmcnt(6) -; GFX8-NEXT: v_add_u16_e32 v2, v16, v2 +; GFX8-NEXT: v_add_u16_e32 v3, v16, v14 ; GFX8-NEXT: s_waitcnt vmcnt(5) -; GFX8-NEXT: v_add_u16_e32 v3, v17, v3 -; GFX8-NEXT: flat_store_short v[4:5], v2 -; GFX8-NEXT: flat_store_short v[0:1], v3 +; GFX8-NEXT: v_add_u16_e32 v8, v17, v15 +; GFX8-NEXT: flat_store_short v[4:5], v3 +; GFX8-NEXT: flat_store_short v[0:1], v8 ; GFX8-NEXT: v_add_u32_e32 v0, vcc, 4, v4 ; GFX8-NEXT: s_waitcnt vmcnt(6) -; GFX8-NEXT: v_add_u16_e32 v6, v18, v6 +; GFX8-NEXT: v_add_u16_e32 v9, v18, v19 ; GFX8-NEXT: v_addc_u32_e32 v1, vcc, 0, v5, vcc -; GFX8-NEXT: flat_store_short v[0:1], v6 +; GFX8-NEXT: flat_store_short v[0:1], v9 ; GFX8-NEXT: v_add_u32_e32 v0, vcc, 6, v4 -; GFX8-NEXT: s_waitcnt vmcnt(6) -; GFX8-NEXT: v_add_u16_e32 v7, v19, v7 ; GFX8-NEXT: v_addc_u32_e32 v1, vcc, 0, v5, vcc -; GFX8-NEXT: flat_store_short v[0:1], v7 +; GFX8-NEXT: s_waitcnt vmcnt(6) +; GFX8-NEXT: v_add_u16_e32 v10, v10, v20 +; GFX8-NEXT: flat_store_short v[0:1], v10 ; GFX8-NEXT: v_add_u32_e32 v0, vcc, 8, v4 ; GFX8-NEXT: s_waitcnt vmcnt(6) -; GFX8-NEXT: v_add_u16_e32 v8, v20, v8 +; GFX8-NEXT: v_add_u16_e32 v6, v11, v6 ; GFX8-NEXT: v_addc_u32_e32 v1, vcc, 0, v5, vcc -; GFX8-NEXT: flat_store_short v[0:1], v8 +; GFX8-NEXT: flat_store_short v[0:1], v6 ; GFX8-NEXT: v_add_u32_e32 v0, vcc, 10, v4 ; GFX8-NEXT: s_waitcnt vmcnt(6) -; GFX8-NEXT: v_add_u16_e32 v9, v21, v9 +; GFX8-NEXT: v_add_u16_e32 v7, v12, v7 ; GFX8-NEXT: v_addc_u32_e32 v1, vcc, 0, v5, vcc -; GFX8-NEXT: flat_store_short v[0:1], v9 +; GFX8-NEXT: flat_store_short v[0:1], v7 ; GFX8-NEXT: v_add_u32_e32 v0, vcc, 12, v4 ; GFX8-NEXT: s_waitcnt vmcnt(6) -; GFX8-NEXT: v_add_u16_e32 v10, v22, v10 +; GFX8-NEXT: v_add_u16_e32 v2, v13, v2 ; GFX8-NEXT: v_addc_u32_e32 v1, vcc, 0, v5, vcc -; GFX8-NEXT: flat_store_short v[0:1], v10 +; GFX8-NEXT: flat_store_short v[0:1], v2 ; GFX8-NEXT: s_waitcnt vmcnt(0) ; GFX8-NEXT: s_setpc_b64 s[30:31] ; @@ -532,29 +532,29 @@ define void @add_v9i16(ptr addrspace(1) %ptra, ptr addrspace(1) %ptrb, ptr addrs ; GFX8-NEXT: flat_load_dwordx4 v[10:13], v[2:3] ; GFX8-NEXT: v_add_u32_e32 v0, vcc, 16, v0 ; GFX8-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc -; GFX8-NEXT: flat_load_ushort v14, v[0:1] +; GFX8-NEXT: flat_load_ushort v16, v[0:1] ; GFX8-NEXT: v_add_u32_e32 v0, vcc, 16, v2 ; GFX8-NEXT: v_addc_u32_e32 v1, vcc, 0, v3, vcc ; GFX8-NEXT: flat_load_ushort v0, v[0:1] +; GFX8-NEXT: v_add_u32_e32 v14, vcc, 16, v4 +; GFX8-NEXT: v_addc_u32_e32 v15, vcc, 0, v5, vcc ; GFX8-NEXT: s_waitcnt vmcnt(2) ; GFX8-NEXT: v_add_u16_e32 v1, v6, v10 ; GFX8-NEXT: v_add_u16_sdwa v2, v6, v10 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1 ; GFX8-NEXT: v_add_u16_e32 v3, v7, v11 -; GFX8-NEXT: v_add_u16_sdwa v10, v7, v11 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1 -; GFX8-NEXT: v_add_u16_e32 v11, v8, v12 +; GFX8-NEXT: v_add_u16_sdwa v6, v7, v11 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1 +; GFX8-NEXT: v_add_u16_e32 v7, v8, v12 ; GFX8-NEXT: v_add_u16_sdwa v8, v8, v12 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1 -; GFX8-NEXT: v_add_u16_e32 v12, v9, v13 +; GFX8-NEXT: v_add_u16_e32 v10, v9, v13 ; GFX8-NEXT: v_add_u16_sdwa v9, v9, v13 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1 -; GFX8-NEXT: v_add_u32_e32 v6, vcc, 16, v4 ; GFX8-NEXT: s_waitcnt vmcnt(0) -; GFX8-NEXT: v_add_u16_e32 v13, v14, v0 +; GFX8-NEXT: v_add_u16_e32 v11, v16, v0 ; GFX8-NEXT: v_or_b32_e32 v0, v1, v2 -; GFX8-NEXT: v_or_b32_e32 v1, v3, v10 -; GFX8-NEXT: v_or_b32_e32 v2, v11, v8 -; GFX8-NEXT: v_or_b32_e32 v3, v12, v9 -; GFX8-NEXT: v_addc_u32_e32 v7, vcc, 0, v5, vcc +; GFX8-NEXT: v_or_b32_e32 v1, v3, v6 +; GFX8-NEXT: v_or_b32_e32 v2, v7, v8 +; GFX8-NEXT: v_or_b32_e32 v3, v10, v9 ; GFX8-NEXT: flat_store_dwordx4 v[4:5], v[0:3] -; GFX8-NEXT: flat_store_short v[6:7], v13 +; GFX8-NEXT: flat_store_short v[14:15], v11 ; GFX8-NEXT: s_waitcnt vmcnt(0) ; GFX8-NEXT: s_setpc_b64 s[30:31] ; @@ -685,55 +685,55 @@ define void @add_v11i16(ptr addrspace(1) %ptra, ptr addrspace(1) %ptrb, ptr addr ; GFX8-LABEL: add_v11i16: ; GFX8: ; %bb.0: ; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) +; GFX8-NEXT: v_add_u32_e32 v14, vcc, 16, v0 +; GFX8-NEXT: v_addc_u32_e32 v15, vcc, 0, v1, vcc +; GFX8-NEXT: v_add_u32_e32 v16, vcc, 18, v0 +; GFX8-NEXT: v_addc_u32_e32 v17, vcc, 0, v1, vcc ; GFX8-NEXT: flat_load_dwordx4 v[6:9], v[0:1] -; GFX8-NEXT: flat_load_dwordx4 v[10:13], v[2:3] -; GFX8-NEXT: v_add_u32_e32 v14, vcc, 16, v2 -; GFX8-NEXT: v_addc_u32_e32 v15, vcc, 0, v3, vcc -; GFX8-NEXT: v_add_u32_e32 v16, vcc, 18, v2 -; GFX8-NEXT: v_addc_u32_e32 v17, vcc, 0, v3, vcc -; GFX8-NEXT: v_add_u32_e32 v2, vcc, 20, v2 -; GFX8-NEXT: v_addc_u32_e32 v3, vcc, 0, v3, vcc -; GFX8-NEXT: flat_load_ushort v14, v[14:15] -; GFX8-NEXT: flat_load_ushort v15, v[16:17] -; GFX8-NEXT: flat_load_ushort v16, v[2:3] -; GFX8-NEXT: v_add_u32_e32 v2, vcc, 16, v0 -; GFX8-NEXT: v_addc_u32_e32 v3, vcc, 0, v1, vcc -; GFX8-NEXT: s_waitcnt vmcnt(3) -; GFX8-NEXT: v_add_u16_e32 v17, v6, v10 -; GFX8-NEXT: v_add_u16_sdwa v10, v6, v10 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1 -; GFX8-NEXT: v_add_u32_e32 v6, vcc, 18, v0 -; GFX8-NEXT: v_add_u16_e32 v18, v7, v11 -; GFX8-NEXT: v_add_u16_sdwa v11, v7, v11 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1 -; GFX8-NEXT: v_addc_u32_e32 v7, vcc, 0, v1, vcc ; GFX8-NEXT: v_add_u32_e32 v0, vcc, 20, v0 -; GFX8-NEXT: flat_load_ushort v2, v[2:3] -; GFX8-NEXT: flat_load_ushort v3, v[6:7] ; GFX8-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc -; GFX8-NEXT: flat_load_ushort v21, v[0:1] -; GFX8-NEXT: v_add_u32_e32 v6, vcc, 16, v4 -; GFX8-NEXT: v_addc_u32_e32 v7, vcc, 0, v5, vcc -; GFX8-NEXT: v_add_u16_e32 v19, v8, v12 +; GFX8-NEXT: flat_load_dwordx4 v[10:13], v[2:3] +; GFX8-NEXT: flat_load_ushort v18, v[14:15] +; GFX8-NEXT: flat_load_ushort v16, v[16:17] +; GFX8-NEXT: flat_load_ushort v17, v[0:1] +; GFX8-NEXT: v_add_u32_e32 v0, vcc, 16, v2 +; GFX8-NEXT: v_addc_u32_e32 v1, vcc, 0, v3, vcc +; GFX8-NEXT: v_add_u32_e32 v14, vcc, 18, v2 +; GFX8-NEXT: v_addc_u32_e32 v15, vcc, 0, v3, vcc +; GFX8-NEXT: flat_load_ushort v19, v[0:1] +; GFX8-NEXT: flat_load_ushort v20, v[14:15] +; GFX8-NEXT: v_add_u32_e32 v0, vcc, 20, v2 +; GFX8-NEXT: v_addc_u32_e32 v1, vcc, 0, v3, vcc +; GFX8-NEXT: flat_load_ushort v0, v[0:1] +; GFX8-NEXT: v_add_u32_e32 v14, vcc, 16, v4 +; GFX8-NEXT: v_addc_u32_e32 v15, vcc, 0, v5, vcc +; GFX8-NEXT: s_waitcnt vmcnt(6) +; GFX8-NEXT: v_add_u16_e32 v1, v6, v10 +; GFX8-NEXT: v_add_u16_sdwa v2, v6, v10 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1 +; GFX8-NEXT: v_add_u32_e32 v6, vcc, 18, v4 +; GFX8-NEXT: v_add_u16_e32 v3, v7, v11 +; GFX8-NEXT: v_add_u16_sdwa v10, v7, v11 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1 +; GFX8-NEXT: v_add_u16_e32 v11, v8, v12 ; GFX8-NEXT: v_add_u16_sdwa v12, v8, v12 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1 -; GFX8-NEXT: v_add_u32_e32 v8, vcc, 18, v4 -; GFX8-NEXT: v_add_u16_e32 v20, v9, v13 +; GFX8-NEXT: v_add_u16_e32 v21, v9, v13 ; GFX8-NEXT: v_add_u16_sdwa v13, v9, v13 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1 -; GFX8-NEXT: v_addc_u32_e32 v9, vcc, 0, v5, vcc -; GFX8-NEXT: v_or_b32_e32 v0, v17, v10 -; GFX8-NEXT: v_or_b32_e32 v1, v18, v11 -; GFX8-NEXT: v_add_u32_e32 v10, vcc, 20, v4 -; GFX8-NEXT: v_addc_u32_e32 v11, vcc, 0, v5, vcc +; GFX8-NEXT: v_addc_u32_e32 v7, vcc, 0, v5, vcc +; GFX8-NEXT: v_add_u32_e32 v8, vcc, 20, v4 ; GFX8-NEXT: s_waitcnt vmcnt(2) -; GFX8-NEXT: v_add_u16_e32 v14, v2, v14 +; GFX8-NEXT: v_add_u16_e32 v18, v18, v19 ; GFX8-NEXT: s_waitcnt vmcnt(1) -; GFX8-NEXT: v_add_u16_e32 v15, v3, v15 -; GFX8-NEXT: v_or_b32_e32 v2, v19, v12 -; GFX8-NEXT: v_or_b32_e32 v3, v20, v13 +; GFX8-NEXT: v_add_u16_e32 v16, v16, v20 +; GFX8-NEXT: v_addc_u32_e32 v9, vcc, 0, v5, vcc ; GFX8-NEXT: s_waitcnt vmcnt(0) -; GFX8-NEXT: v_add_u16_e32 v16, v21, v16 +; GFX8-NEXT: v_add_u16_e32 v17, v17, v0 +; GFX8-NEXT: v_or_b32_e32 v0, v1, v2 +; GFX8-NEXT: v_or_b32_e32 v1, v3, v10 +; GFX8-NEXT: v_or_b32_e32 v2, v11, v12 +; GFX8-NEXT: v_or_b32_e32 v3, v21, v13 ; GFX8-NEXT: flat_store_dwordx4 v[4:5], v[0:3] -; GFX8-NEXT: flat_store_short v[6:7], v14 -; GFX8-NEXT: flat_store_short v[8:9], v15 -; GFX8-NEXT: flat_store_short v[10:11], v16 +; GFX8-NEXT: flat_store_short v[14:15], v18 +; GFX8-NEXT: flat_store_short v[6:7], v16 +; GFX8-NEXT: flat_store_short v[8:9], v17 ; GFX8-NEXT: s_waitcnt vmcnt(0) ; GFX8-NEXT: s_setpc_b64 s[30:31] ; @@ -825,34 +825,34 @@ define void @add_v12i16(ptr addrspace(1) %ptra, ptr addrspace(1) %ptrb, ptr addr ; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; GFX8-NEXT: flat_load_dwordx4 v[6:9], v[0:1] ; GFX8-NEXT: flat_load_dwordx4 v[10:13], v[2:3] -; GFX8-NEXT: v_add_u32_e32 v2, vcc, 16, v2 -; GFX8-NEXT: v_addc_u32_e32 v3, vcc, 0, v3, vcc ; GFX8-NEXT: v_add_u32_e32 v0, vcc, 16, v0 ; GFX8-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc -; GFX8-NEXT: flat_load_dwordx2 v[14:15], v[2:3] -; GFX8-NEXT: s_waitcnt vmcnt(1) -; GFX8-NEXT: v_add_u16_e32 v2, v6, v10 -; GFX8-NEXT: v_add_u16_sdwa v3, v6, v10 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1 -; GFX8-NEXT: v_add_u16_e32 v10, v7, v11 -; GFX8-NEXT: v_add_u16_sdwa v11, v7, v11 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1 -; GFX8-NEXT: flat_load_dwordx2 v[6:7], v[0:1] -; GFX8-NEXT: v_add_u16_e32 v16, v8, v12 -; GFX8-NEXT: v_add_u16_sdwa v8, v8, v12 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1 -; GFX8-NEXT: v_add_u16_e32 v12, v9, v13 +; GFX8-NEXT: flat_load_dwordx2 v[14:15], v[0:1] +; GFX8-NEXT: v_add_u32_e32 v0, vcc, 16, v2 +; GFX8-NEXT: v_addc_u32_e32 v1, vcc, 0, v3, vcc +; GFX8-NEXT: flat_load_dwordx2 v[16:17], v[0:1] +; GFX8-NEXT: s_waitcnt vmcnt(2) +; GFX8-NEXT: v_add_u16_e32 v0, v6, v10 +; GFX8-NEXT: v_add_u16_sdwa v1, v6, v10 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1 +; GFX8-NEXT: v_add_u16_e32 v2, v7, v11 +; GFX8-NEXT: v_add_u16_sdwa v3, v7, v11 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1 +; GFX8-NEXT: v_add_u16_e32 v6, v8, v12 +; GFX8-NEXT: v_add_u16_sdwa v7, v8, v12 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1 +; GFX8-NEXT: v_add_u16_e32 v8, v9, v13 ; GFX8-NEXT: v_add_u16_sdwa v9, v9, v13 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1 -; GFX8-NEXT: v_or_b32_e32 v0, v2, v3 -; GFX8-NEXT: v_or_b32_e32 v1, v10, v11 -; GFX8-NEXT: v_or_b32_e32 v2, v16, v8 -; GFX8-NEXT: v_or_b32_e32 v3, v12, v9 +; GFX8-NEXT: v_or_b32_e32 v0, v0, v1 +; GFX8-NEXT: v_or_b32_e32 v1, v2, v3 +; GFX8-NEXT: v_or_b32_e32 v2, v6, v7 +; GFX8-NEXT: v_or_b32_e32 v3, v8, v9 +; GFX8-NEXT: s_waitcnt vmcnt(0) +; GFX8-NEXT: v_add_u16_e32 v6, v14, v16 +; GFX8-NEXT: v_add_u16_sdwa v7, v14, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1 +; GFX8-NEXT: v_add_u16_e32 v8, v15, v17 +; GFX8-NEXT: v_add_u16_sdwa v9, v15, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1 ; GFX8-NEXT: flat_store_dwordx4 v[4:5], v[0:3] -; GFX8-NEXT: s_waitcnt vmcnt(1) -; GFX8-NEXT: v_add_u16_e32 v8, v6, v14 -; GFX8-NEXT: v_add_u16_sdwa v6, v6, v14 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1 -; GFX8-NEXT: v_add_u16_e32 v9, v7, v15 -; GFX8-NEXT: v_add_u16_sdwa v7, v7, v15 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1 +; GFX8-NEXT: v_or_b32_e32 v6, v6, v7 ; GFX8-NEXT: v_add_u32_e32 v0, vcc, 16, v4 -; GFX8-NEXT: v_or_b32_e32 v6, v8, v6 -; GFX8-NEXT: v_or_b32_e32 v7, v9, v7 +; GFX8-NEXT: v_or_b32_e32 v7, v8, v9 ; GFX8-NEXT: v_addc_u32_e32 v1, vcc, 0, v5, vcc ; GFX8-NEXT: flat_store_dwordx2 v[0:1], v[6:7] ; GFX8-NEXT: s_waitcnt vmcnt(0) diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-fma-add-fma-mul.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-fma-add-fma-mul.ll index 8183a4dec10ca..f773983ef0f01 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-fma-add-fma-mul.ll +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-fma-add-fma-mul.ll @@ -699,24 +699,24 @@ define <4 x double> @test_f64_add_mul(<4 x double> %a, <4 x double> %b, <4 x dou ; GFX9-CONTRACT-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; GFX9-CONTRACT-NEXT: buffer_load_dword v31, off, s[0:3], s32 offset:4 ; GFX9-CONTRACT-NEXT: buffer_load_dword v32, off, s[0:3], s32 offset:8 -; GFX9-CONTRACT-NEXT: s_waitcnt vmcnt(0) +; GFX9-CONTRACT-NEXT: buffer_load_dword v33, off, s[0:3], s32 offset:12 +; GFX9-CONTRACT-NEXT: buffer_load_dword v34, off, s[0:3], s32 offset:16 +; GFX9-CONTRACT-NEXT: buffer_load_dword v35, off, s[0:3], s32 offset:20 +; GFX9-CONTRACT-NEXT: buffer_load_dword v36, off, s[0:3], s32 offset:24 +; GFX9-CONTRACT-NEXT: buffer_load_dword v37, off, s[0:3], s32 offset:28 +; GFX9-CONTRACT-NEXT: s_waitcnt vmcnt(5) ; GFX9-CONTRACT-NEXT: v_fma_f64 v[16:17], v[16:17], v[24:25], v[31:32] -; GFX9-CONTRACT-NEXT: buffer_load_dword v24, off, s[0:3], s32 offset:12 -; GFX9-CONTRACT-NEXT: buffer_load_dword v25, off, s[0:3], s32 offset:16 +; GFX9-CONTRACT-NEXT: buffer_load_dword v31, off, s[0:3], s32 +; GFX9-CONTRACT-NEXT: buffer_load_dword v38, off, s[0:3], s32 offset:32 +; GFX9-CONTRACT-NEXT: s_waitcnt vmcnt(5) +; GFX9-CONTRACT-NEXT: v_fma_f64 v[18:19], v[18:19], v[26:27], v[33:34] +; GFX9-CONTRACT-NEXT: s_waitcnt vmcnt(3) +; GFX9-CONTRACT-NEXT: v_fma_f64 v[20:21], v[20:21], v[28:29], v[35:36] ; GFX9-CONTRACT-NEXT: v_fma_f64 v[0:1], v[0:1], v[8:9], v[16:17] -; GFX9-CONTRACT-NEXT: s_waitcnt vmcnt(0) -; GFX9-CONTRACT-NEXT: v_fma_f64 v[18:19], v[18:19], v[26:27], v[24:25] -; GFX9-CONTRACT-NEXT: buffer_load_dword v24, off, s[0:3], s32 offset:20 -; GFX9-CONTRACT-NEXT: buffer_load_dword v25, off, s[0:3], s32 offset:24 ; GFX9-CONTRACT-NEXT: v_fma_f64 v[2:3], v[2:3], v[10:11], v[18:19] -; GFX9-CONTRACT-NEXT: s_waitcnt vmcnt(0) -; GFX9-CONTRACT-NEXT: v_fma_f64 v[20:21], v[20:21], v[28:29], v[24:25] -; GFX9-CONTRACT-NEXT: buffer_load_dword v31, off, s[0:3], s32 -; GFX9-CONTRACT-NEXT: buffer_load_dword v24, off, s[0:3], s32 offset:28 -; GFX9-CONTRACT-NEXT: buffer_load_dword v25, off, s[0:3], s32 offset:32 ; GFX9-CONTRACT-NEXT: v_fma_f64 v[4:5], v[4:5], v[12:13], v[20:21] ; GFX9-CONTRACT-NEXT: s_waitcnt vmcnt(0) -; GFX9-CONTRACT-NEXT: v_fma_f64 v[22:23], v[22:23], v[30:31], v[24:25] +; GFX9-CONTRACT-NEXT: v_fma_f64 v[22:23], v[22:23], v[30:31], v[37:38] ; GFX9-CONTRACT-NEXT: v_fma_f64 v[6:7], v[6:7], v[14:15], v[22:23] ; GFX9-CONTRACT-NEXT: s_setpc_b64 s[30:31] ; @@ -725,24 +725,24 @@ define <4 x double> @test_f64_add_mul(<4 x double> %a, <4 x double> %b, <4 x dou ; GFX9-DENORM-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) ; GFX9-DENORM-NEXT: buffer_load_dword v31, off, s[0:3], s32 offset:4 ; GFX9-DENORM-NEXT: buffer_load_dword v32, off, s[0:3], s32 offset:8 -; GFX9-DENORM-NEXT: s_waitcnt vmcnt(0) +; GFX9-DENORM-NEXT: buffer_load_dword v33, off, s[0:3], s32 offset:12 +; GFX9-DENORM-NEXT: buffer_load_dword v34, off, s[0:3], s32 offset:16 +; GFX9-DENORM-NEXT: buffer_load_dword v35, off, s[0:3], s32 offset:20 +; GFX9-DENORM-NEXT: buffer_load_dword v36, off, s[0:3], s32 offset:24 +; GFX9-DENORM-NEXT: buffer_load_dword v37, off, s[0:3], s32 offset:28 +; GFX9-DENORM-NEXT: s_waitcnt vmcnt(5) ; GFX9-DENORM-NEXT: v_fma_f64 v[16:17], v[16:17], v[24:25], v[31:32] -; GFX9-DENORM-NEXT: buffer_load_dword v24, off, s[0:3], s32 offset:12 -; GFX9-DENORM-NEXT: buffer_load_dword v25, off, s[0:3], s32 offset:16 +; GFX9-DENORM-NEXT: buffer_load_dword v31, off, s[0:3], s32 +; GFX9-DENORM-NEXT: buffer_load_dword v38, off, s[0:3], s32 offset:32 +; GFX9-DENORM-NEXT: s_waitcnt vmcnt(5) +; GFX9-DENORM-NEXT: v_fma_f64 v[18:19], v[18:19], v[26:27], v[33:34] +; GFX9-DENORM-NEXT: s_waitcnt vmcnt(3) +; GFX9-DENORM-NEXT: v_fma_f64 v[20:21], v[20:21], v[28:29], v[35:36] ; GFX9-DENORM-NEXT: v_fma_f64 v[0:1], v[0:1], v[8:9], v[16:17] -; GFX9-DENORM-NEXT: s_waitcnt vmcnt(0) -; GFX9-DENORM-NEXT: v_fma_f64 v[18:19], v[18:19], v[26:27], v[24:25] -; GFX9-DENORM-NEXT: buffer_load_dword v24, off, s[0:3], s32 offset:20 -; GFX9-DENORM-NEXT: buffer_load_dword v25, off, s[0:3], s32 offset:24 ; GFX9-DENORM-NEXT: v_fma_f64 v[2:3], v[2:3], v[10:11], v[18:19] -; GFX9-DENORM-NEXT: s_waitcnt vmcnt(0) -; GFX9-DENORM-NEXT: v_fma_f64 v[20:21], v[20:21], v[28:29], v[24:25] -; GFX9-DENORM-NEXT: buffer_load_dword v31, off, s[0:3], s32 -; GFX9-DENORM-NEXT: buffer_load_dword v24, off, s[0:3], s32 o... [truncated] `````````` </details> https://github.com/llvm/llvm-project/pull/184400 _______________________________________________ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
