llvmbot wrote:
<!--LLVM PR SUMMARY COMMENT--> @llvm/pr-subscribers-backend-arm @llvm/pr-subscribers-backend-powerpc Author: Guy David (guy-david) <details> <summary>Changes</summary> Requires https://github.com/llvm/llvm-project/pull/128745. Lower it slightly below the likeliness of a null-check to be true which is set to 37.5% (see PtrUntakenProb). Otherwise, it will split the edge and create another basic-block and with an unconditional branch which might make the CFG more complex and with a suboptimal block placement. Note that if multiple instructions can be sinked from the same edge then a split will occur regardless of this change. On M4 Pro: ``` $ ./utils/compare.py build-a/results1.json build-a/results2.json build-a/results3.json vs build-b/results1.json build-b/results2.json build-b/results3.json Tests: 4314 Metric: exec_time Program exec_time lhs rhs diff MultiSourc...chmarks/Prolangs-C/agrep/agrep 0.00 0.01 44.7% MultiSourc...rks/McCat/03-testtrie/testtrie 0.01 0.01 31.7% SingleSour...hmarks/Shootout/Shootout-lists 2.02 2.64 30.6% SingleSour...ecute/GCC-C-execute-20170419-1 0.00 0.00 14.3% SingleSour.../execute/GCC-C-execute-pr59101 0.00 0.00 14.3% SingleSour...ecute/GCC-C-execute-20040311-1 0.00 0.00 14.3% SingleSour.../execute/GCC-C-execute-pr57124 0.00 0.00 14.3% SingleSour...ecute/GCC-C-execute-20031204-1 0.00 0.00 14.3% SingleSour...xecute/GCC-C-execute-pr57344-3 0.00 0.00 14.3% SingleSour.../execute/GCC-C-execute-pr57875 0.00 0.00 14.3% SingleSour...ecute/GCC-C-execute-20030811-1 0.00 0.00 14.3% SingleSour.../execute/GCC-C-execute-pr58640 0.00 0.00 14.3% SingleSour...ecute/GCC-C-execute-20030408-1 0.00 0.00 14.3% SingleSour...ecute/GCC-C-execute-20030323-1 0.00 0.00 14.3% SingleSour...ecute/GCC-C-execute-20030203-1 0.00 0.00 14.3% Geomean difference 0.1% exec_time l/r lhs rhs diff count 4314.000000 4314.000000 4294.000000 mean 453.919219 454.105532 0.002072 std 10865.757400 10868.002426 0.043046 min 0.000000 0.000000 -0.171642 25% 0.000700 0.000700 0.000000 50% 0.007400 0.007400 0.000000 75% 0.047829 0.047950 0.000033 max 321294.306703 321320.624713 0.447368 ``` On Ryzen9 5950X: ``` $ ./utils/compare.py build-a/results1.json build-a/results2.json build-a/results3.json vs build-b/results1.json build-b/results2.json build-b/results3.json Tests: 3326 Metric: exec_time Program exec_time lhs rhs diff MemFunctio...mCmp<1, GreaterThanZero, None> 1741.26 1885.00 143.74 MemFunctio..._MemCmp<1, LessThanZero, Last> 1759.78 1873.93 114.15 MemFunctio...est:BM_MemCmp<1, EqZero, Last> 1747.19 1847.42 100.22 MemFunctio...Cmp<1, GreaterThanZero, First> 1750.17 1844.57 94.40 MemFunctio...mCmp<1, GreaterThanZero, Last> 1751.05 1844.68 93.63 MemFunctio...emCmp<1, GreaterThanZero, Mid> 1756.49 1849.62 93.13 MemFunctio..._MemCmp<1, LessThanZero, None> 1744.87 1835.22 90.35 MemFunctio...M_MemCmp<1, LessThanZero, Mid> 1757.53 1846.29 88.77 harris/har...est:BENCHMARK_HARRIS/1024/1024 5689.29 5754.88 65.59 MemFunctio...MemCmp<2, LessThanZero, First> 1123.00 1181.63 58.63 MemFunctio...test:BM_MemCmp<1, EqZero, Mid> 2524.93 2582.21 57.28 MemFunctio...est:BM_MemCmp<1, EqZero, None> 2525.97 2582.43 56.46 MemFunctio..._MemCmp<3, LessThanZero, Last> 869.04 924.66 55.62 MemFunctio...test:BM_MemCmp<3, EqZero, Mid> 878.39 932.53 54.14 MemFunctio...MemCmp<1, LessThanZero, First> 2528.37 2582.27 53.90 exec_time l/r lhs rhs diff Program test-suite :: MicroBenchmarks/MemFunctions/MemFunctions.test:BM_MemCmp<1, GreaterThanZero, None> 1741.261663 1884.998860 143.737197 test-suite :: MicroBenchmarks/MemFunctions/MemFunctions.test:BM_MemCmp<1, LessThanZero, Last> 1759.779355 1873.926412 114.147056 test-suite :: MicroBenchmarks/MemFunctions/MemFunctions.test:BM_MemCmp<1, EqZero, Last> 1747.192734 1847.416650 100.223916 test-suite :: MicroBenchmarks/MemFunctions/MemFunctions.test:BM_MemCmp<1, GreaterThanZero, First> 1750.171003 1844.569735 94.398732 test-suite :: MicroBenchmarks/MemFunctions/MemFunctions.test:BM_MemCmp<1, GreaterThanZero, Last> 1751.049323 1844.682784 93.633461 ... ... ... ... test-suite :: MicroBenchmarks/LoopVectorization/LoopVectorizationBenchmarks.test:benchVecWithRuntimeChecks4PointersDAfterA/1000 435033.995649 412835.347288 -22198.648362 test-suite :: MicroBenchmarks/LoopVectorization/LoopVectorizationBenchmarks.test:benchVecWithRuntimeChecks4PointersAllDisjointDecreasing/1000 435136.829708 412921.450737 -22215.378970 test-suite :: MicroBenchmarks/LoopVectorization/LoopVectorizationBenchmarks.test:benchVecWithRuntimeChecks4PointersAllDisjointIncreasing/1000 435136.457427 412908.677876 -22227.779551 test-suite :: MicroBenchmarks/LoopVectorization/LoopVectorizationBenchmarks.test:benchVecWithRuntimeChecks4PointersDEqualsA/1000 435088.787446 412769.793042 -22318.994403 test-suite :: MicroBenchmarks/LoopVectorization/LoopVectorizationBenchmarks.test:benchVecWithRuntimeChecks4PointersDBeforeA/1000 835721.265233 791510.926471 -44210.338762 [3326 rows x 3 columns] exec_time l/r lhs rhs diff count 3326.000000 3326.000000 3326.000000 mean 916.350942 873.987972 -42.362970 std 20951.565020 19865.106212 1087.788132 min 0.000000 0.000000 -44210.338762 25% 0.000000 0.000000 -0.000400 50% 0.000400 0.000400 0.000000 75% 1.774625 1.732975 0.000400 max 835721.265233 791510.926471 143.737197 ``` I looked into the disassembly of `BM_MemCmp<1, GreaterThanZero, None>` in `MemFunctions.test` and it has not changed. --- Patch is 226.63 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/127666.diff 51 Files Affected: - (modified) llvm/lib/CodeGen/MachineSink.cpp (+1-1) - (modified) llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll (+23-23) - (modified) llvm/test/CodeGen/AArch64/swifterror.ll (+2-4) - (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/sdiv.i64.ll (+5-7) - (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/srem.i64.ll (+5-7) - (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/udiv.i64.ll (+5-7) - (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/urem.i64.ll (+5-7) - (modified) llvm/test/CodeGen/AMDGPU/artificial-terminators.mir (+5-9) - (modified) llvm/test/CodeGen/AMDGPU/blender-no-live-segment-at-def-implicit-def.ll (+32-32) - (modified) llvm/test/CodeGen/AMDGPU/dynamic_stackalloc.ll (+20-16) - (modified) llvm/test/CodeGen/AMDGPU/indirect-addressing-si.ll (+64-52) - (modified) llvm/test/CodeGen/AMDGPU/no-fold-accvgpr-mov.ll (+24-18) - (modified) llvm/test/CodeGen/AMDGPU/optimize-negated-cond.ll (+8-7) - (modified) llvm/test/CodeGen/AMDGPU/skip-if-dead.ll (+8-16) - (modified) llvm/test/CodeGen/ARM/and-cmp0-sink.ll (+14-14) - (modified) llvm/test/CodeGen/Mips/llvm-ir/sdiv-freebsd.ll (+6-3) - (modified) llvm/test/CodeGen/PowerPC/common-chain-aix32.ll (+16-17) - (modified) llvm/test/CodeGen/PowerPC/common-chain.ll (+80-89) - (modified) llvm/test/CodeGen/PowerPC/ifcvt_cr_field.ll (+6-12) - (modified) llvm/test/CodeGen/PowerPC/knowCRBitSpill.ll (+1) - (modified) llvm/test/CodeGen/PowerPC/loop-instr-form-prepare.ll (+88-114) - (modified) llvm/test/CodeGen/PowerPC/loop-instr-prep-non-const-increasement.ll (+16-19) - (modified) llvm/test/CodeGen/PowerPC/mma-phi-accs.ll (+6-12) - (modified) llvm/test/CodeGen/PowerPC/p10-spill-creq.ll (+28-33) - (modified) llvm/test/CodeGen/PowerPC/ppc64-rop-protection-aix.ll (+54-60) - (modified) llvm/test/CodeGen/PowerPC/ppc64-rop-protection.ll (+66-81) - (modified) llvm/test/CodeGen/PowerPC/shrink-wrap.ll (+12-20) - (modified) llvm/test/CodeGen/PowerPC/spe.ll (+2-4) - (modified) llvm/test/CodeGen/PowerPC/zext-and-cmp.ll (+16-6) - (modified) llvm/test/CodeGen/RISCV/fold-addi-loadstore.ll (+6-12) - (modified) llvm/test/CodeGen/Thumb2/mve-pipelineloops.ll (+7-10) - (modified) llvm/test/CodeGen/WebAssembly/implicit-def.ll (+23-12) - (modified) llvm/test/CodeGen/X86/2007-11-06-InstrSched.ll (+6-9) - (modified) llvm/test/CodeGen/X86/2008-04-28-CoalescerBug.ll (+8-11) - (modified) llvm/test/CodeGen/X86/atomic-rm-bit-test-64.ll (+61-68) - (modified) llvm/test/CodeGen/X86/atomic-rm-bit-test.ll (+342-395) - (modified) llvm/test/CodeGen/X86/branchfolding-debugloc.ll (+4-5) - (modified) llvm/test/CodeGen/X86/break-false-dep.ll (+28-48) - (modified) llvm/test/CodeGen/X86/coalescer-commute4.ll (+6-9) - (modified) llvm/test/CodeGen/X86/ctlo.ll (+21-26) - (modified) llvm/test/CodeGen/X86/ctlz.ll (+56-72) - (modified) llvm/test/CodeGen/X86/cttz.ll (+18-30) - (modified) llvm/test/CodeGen/X86/fold-loop-of-urem.ll (+26-32) - (modified) llvm/test/CodeGen/X86/lsr-sort.ll (+6-4) - (modified) llvm/test/CodeGen/X86/mmx-arith.ll (+7-9) - (modified) llvm/test/CodeGen/X86/pr2659.ll (+32-10) - (modified) llvm/test/CodeGen/X86/pr38795.ll (+50-53) - (modified) llvm/test/CodeGen/X86/probe-stack-eflags.ll (+4-6) - (modified) llvm/test/CodeGen/X86/taildup-heapallocsite.ll (+4-9) - (modified) llvm/test/CodeGen/X86/testb-je-fusion.ll (+8-10) - (modified) llvm/test/CodeGen/X86/x86-shrink-wrapping.ll (+12-16) ``````````diff diff --git a/llvm/lib/CodeGen/MachineSink.cpp b/llvm/lib/CodeGen/MachineSink.cpp index 82acb780cfb72..81459cf65d6c2 100644 --- a/llvm/lib/CodeGen/MachineSink.cpp +++ b/llvm/lib/CodeGen/MachineSink.cpp @@ -82,7 +82,7 @@ static cl::opt<unsigned> SplitEdgeProbabilityThreshold( "If the branch threshold is higher than this threshold, we allow " "speculative execution of up to 1 instruction to avoid branching to " "splitted critical edge"), - cl::init(40), cl::Hidden); + cl::init(35), cl::Hidden); static cl::opt<unsigned> SinkLoadInstsPerBlockThreshold( "machine-sink-load-instrs-threshold", diff --git a/llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll b/llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll index fb6575cc0ee83..fdc087e9c1991 100644 --- a/llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll +++ b/llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll @@ -632,20 +632,18 @@ define i16 @red_mla_dup_ext_u8_s8_s16(ptr noalias nocapture noundef readonly %A, ; ; CHECK-GI-LABEL: red_mla_dup_ext_u8_s8_s16: ; CHECK-GI: // %bb.0: // %entry -; CHECK-GI-NEXT: cbz w2, .LBB5_3 +; CHECK-GI-NEXT: mov w8, wzr +; CHECK-GI-NEXT: cbz w2, .LBB5_9 ; CHECK-GI-NEXT: // %bb.1: // %for.body.preheader ; CHECK-GI-NEXT: cmp w2, #16 ; CHECK-GI-NEXT: mov w8, w2 -; CHECK-GI-NEXT: b.hs .LBB5_4 +; CHECK-GI-NEXT: b.hs .LBB5_3 ; CHECK-GI-NEXT: // %bb.2: ; CHECK-GI-NEXT: mov w10, #0 // =0x0 ; CHECK-GI-NEXT: mov x9, xzr ; CHECK-GI-NEXT: fmov s0, w10 -; CHECK-GI-NEXT: b .LBB5_8 -; CHECK-GI-NEXT: .LBB5_3: -; CHECK-GI-NEXT: mov w0, wzr -; CHECK-GI-NEXT: ret -; CHECK-GI-NEXT: .LBB5_4: // %vector.ph +; CHECK-GI-NEXT: b .LBB5_7 +; CHECK-GI-NEXT: .LBB5_3: // %vector.ph ; CHECK-GI-NEXT: lsl w9, w1, #8 ; CHECK-GI-NEXT: movi v0.2d, #0000000000000000 ; CHECK-GI-NEXT: movi v1.2d, #0000000000000000 @@ -654,7 +652,7 @@ define i16 @red_mla_dup_ext_u8_s8_s16(ptr noalias nocapture noundef readonly %A, ; CHECK-GI-NEXT: dup v2.8h, w9 ; CHECK-GI-NEXT: and x9, x8, #0xfffffff0 ; CHECK-GI-NEXT: mov x11, x9 -; CHECK-GI-NEXT: .LBB5_5: // %vector.body +; CHECK-GI-NEXT: .LBB5_4: // %vector.body ; CHECK-GI-NEXT: // =>This Inner Loop Header: Depth=1 ; CHECK-GI-NEXT: ldp d3, d4, [x10, #-8] ; CHECK-GI-NEXT: subs x11, x11, #16 @@ -663,29 +661,31 @@ define i16 @red_mla_dup_ext_u8_s8_s16(ptr noalias nocapture noundef readonly %A, ; CHECK-GI-NEXT: ushll v4.8h, v4.8b, #0 ; CHECK-GI-NEXT: mla v0.8h, v2.8h, v3.8h ; CHECK-GI-NEXT: mla v1.8h, v2.8h, v4.8h -; CHECK-GI-NEXT: b.ne .LBB5_5 -; CHECK-GI-NEXT: // %bb.6: // %middle.block +; CHECK-GI-NEXT: b.ne .LBB5_4 +; CHECK-GI-NEXT: // %bb.5: // %middle.block ; CHECK-GI-NEXT: add v0.8h, v1.8h, v0.8h ; CHECK-GI-NEXT: cmp x9, x8 ; CHECK-GI-NEXT: addv h0, v0.8h -; CHECK-GI-NEXT: b.ne .LBB5_8 -; CHECK-GI-NEXT: // %bb.7: -; CHECK-GI-NEXT: fmov w0, s0 +; CHECK-GI-NEXT: b.ne .LBB5_7 +; CHECK-GI-NEXT: // %bb.6: +; CHECK-GI-NEXT: fmov w8, s0 +; CHECK-GI-NEXT: mov w0, w8 ; CHECK-GI-NEXT: ret -; CHECK-GI-NEXT: .LBB5_8: // %for.body.preheader1 +; CHECK-GI-NEXT: .LBB5_7: // %for.body.preheader1 ; CHECK-GI-NEXT: sxtb w10, w1 -; CHECK-GI-NEXT: sub x8, x8, x9 +; CHECK-GI-NEXT: sub x11, x8, x9 ; CHECK-GI-NEXT: add x9, x0, x9 -; CHECK-GI-NEXT: .LBB5_9: // %for.body +; CHECK-GI-NEXT: .LBB5_8: // %for.body ; CHECK-GI-NEXT: // =>This Inner Loop Header: Depth=1 -; CHECK-GI-NEXT: ldrb w11, [x9], #1 +; CHECK-GI-NEXT: ldrb w8, [x9], #1 ; CHECK-GI-NEXT: fmov w12, s0 -; CHECK-GI-NEXT: subs x8, x8, #1 -; CHECK-GI-NEXT: mul w11, w11, w10 -; CHECK-GI-NEXT: add w0, w11, w12, uxth -; CHECK-GI-NEXT: fmov s0, w0 -; CHECK-GI-NEXT: b.ne .LBB5_9 -; CHECK-GI-NEXT: // %bb.10: // %for.cond.cleanup +; CHECK-GI-NEXT: subs x11, x11, #1 +; CHECK-GI-NEXT: mul w8, w8, w10 +; CHECK-GI-NEXT: add w8, w8, w12, uxth +; CHECK-GI-NEXT: fmov s0, w8 +; CHECK-GI-NEXT: b.ne .LBB5_8 +; CHECK-GI-NEXT: .LBB5_9: // %for.cond.cleanup +; CHECK-GI-NEXT: mov w0, w8 ; CHECK-GI-NEXT: ret entry: %conv2 = sext i8 %B to i16 diff --git a/llvm/test/CodeGen/AArch64/swifterror.ll b/llvm/test/CodeGen/AArch64/swifterror.ll index 07ee87e880aff..1ca98f6015c11 100644 --- a/llvm/test/CodeGen/AArch64/swifterror.ll +++ b/llvm/test/CodeGen/AArch64/swifterror.ll @@ -412,6 +412,7 @@ define float @foo_if(ptr swifterror %error_ptr_ref, i32 %cc) { ; CHECK-APPLE-NEXT: .cfi_def_cfa w29, 16 ; CHECK-APPLE-NEXT: .cfi_offset w30, -8 ; CHECK-APPLE-NEXT: .cfi_offset w29, -16 +; CHECK-APPLE-NEXT: movi d0, #0000000000000000 ; CHECK-APPLE-NEXT: cbz w0, LBB3_2 ; CHECK-APPLE-NEXT: ; %bb.1: ; %gen_error ; CHECK-APPLE-NEXT: mov w0, #16 ; =0x10 @@ -420,10 +421,7 @@ define float @foo_if(ptr swifterror %error_ptr_ref, i32 %cc) { ; CHECK-APPLE-NEXT: fmov s0, #1.00000000 ; CHECK-APPLE-NEXT: mov w8, #1 ; =0x1 ; CHECK-APPLE-NEXT: strb w8, [x0, #8] -; CHECK-APPLE-NEXT: ldp x29, x30, [sp], #16 ; 16-byte Folded Reload -; CHECK-APPLE-NEXT: ret -; CHECK-APPLE-NEXT: LBB3_2: -; CHECK-APPLE-NEXT: movi d0, #0000000000000000 +; CHECK-APPLE-NEXT: LBB3_2: ; %common.ret ; CHECK-APPLE-NEXT: ldp x29, x30, [sp], #16 ; 16-byte Folded Reload ; CHECK-APPLE-NEXT: ret ; diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/sdiv.i64.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/sdiv.i64.ll index 0c9ff3eee8231..70caf812ea6c2 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/sdiv.i64.ll +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/sdiv.i64.ll @@ -200,6 +200,7 @@ define amdgpu_ps i64 @s_sdiv_i64(i64 inreg %num, i64 inreg %den) { ; CHECK-NEXT: s_and_b64 s[0:1], s[0:1], s[6:7] ; CHECK-NEXT: v_cmp_ne_u64_e64 vcc, s[0:1], 0 ; CHECK-NEXT: s_mov_b32 s0, 1 +; CHECK-NEXT: ; implicit-def: $vgpr0_vgpr1 ; CHECK-NEXT: s_cbranch_vccz .LBB1_2 ; CHECK-NEXT: ; %bb.1: ; CHECK-NEXT: s_ashr_i32 s6, s3, 31 @@ -330,15 +331,12 @@ define amdgpu_ps i64 @s_sdiv_i64(i64 inreg %num, i64 inreg %den) { ; CHECK-NEXT: v_xor_b32_e32 v0, s6, v0 ; CHECK-NEXT: s_mov_b32 s0, 0 ; CHECK-NEXT: v_subrev_i32_e32 v0, vcc, s6, v0 -; CHECK-NEXT: s_branch .LBB1_3 -; CHECK-NEXT: .LBB1_2: -; CHECK-NEXT: ; implicit-def: $vgpr0_vgpr1 -; CHECK-NEXT: .LBB1_3: ; %Flow +; CHECK-NEXT: .LBB1_2: ; %Flow ; CHECK-NEXT: s_xor_b32 s0, s0, 1 ; CHECK-NEXT: s_and_b32 s0, s0, 1 ; CHECK-NEXT: s_cmp_lg_u32 s0, 0 -; CHECK-NEXT: s_cbranch_scc1 .LBB1_5 -; CHECK-NEXT: ; %bb.4: +; CHECK-NEXT: s_cbranch_scc1 .LBB1_4 +; CHECK-NEXT: ; %bb.3: ; CHECK-NEXT: v_cvt_f32_u32_e32 v0, s4 ; CHECK-NEXT: s_sub_i32 s0, 0, s4 ; CHECK-NEXT: v_rcp_iflag_f32_e32 v0, v0 @@ -358,7 +356,7 @@ define amdgpu_ps i64 @s_sdiv_i64(i64 inreg %num, i64 inreg %den) { ; CHECK-NEXT: v_add_i32_e32 v2, vcc, 1, v0 ; CHECK-NEXT: v_cmp_le_u32_e32 vcc, s4, v1 ; CHECK-NEXT: v_cndmask_b32_e32 v0, v0, v2, vcc -; CHECK-NEXT: .LBB1_5: +; CHECK-NEXT: .LBB1_4: ; CHECK-NEXT: v_readfirstlane_b32 s0, v0 ; CHECK-NEXT: s_mov_b32 s1, s0 ; CHECK-NEXT: ; return to shader part epilog diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/srem.i64.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/srem.i64.ll index df645888626c6..2fcbc41895f03 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/srem.i64.ll +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/srem.i64.ll @@ -194,6 +194,7 @@ define amdgpu_ps i64 @s_srem_i64(i64 inreg %num, i64 inreg %den) { ; CHECK-NEXT: s_and_b64 s[0:1], s[0:1], s[6:7] ; CHECK-NEXT: v_cmp_ne_u64_e64 vcc, s[0:1], 0 ; CHECK-NEXT: s_mov_b32 s7, 1 +; CHECK-NEXT: ; implicit-def: $vgpr0_vgpr1 ; CHECK-NEXT: s_cbranch_vccz .LBB1_2 ; CHECK-NEXT: ; %bb.1: ; CHECK-NEXT: s_ashr_i32 s6, s3, 31 @@ -322,15 +323,12 @@ define amdgpu_ps i64 @s_srem_i64(i64 inreg %num, i64 inreg %den) { ; CHECK-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc ; CHECK-NEXT: v_xor_b32_e32 v0, s6, v0 ; CHECK-NEXT: v_subrev_i32_e32 v0, vcc, s6, v0 -; CHECK-NEXT: s_branch .LBB1_3 -; CHECK-NEXT: .LBB1_2: -; CHECK-NEXT: ; implicit-def: $vgpr0_vgpr1 -; CHECK-NEXT: .LBB1_3: ; %Flow +; CHECK-NEXT: .LBB1_2: ; %Flow ; CHECK-NEXT: s_xor_b32 s0, s7, 1 ; CHECK-NEXT: s_and_b32 s0, s0, 1 ; CHECK-NEXT: s_cmp_lg_u32 s0, 0 -; CHECK-NEXT: s_cbranch_scc1 .LBB1_5 -; CHECK-NEXT: ; %bb.4: +; CHECK-NEXT: s_cbranch_scc1 .LBB1_4 +; CHECK-NEXT: ; %bb.3: ; CHECK-NEXT: v_cvt_f32_u32_e32 v0, s4 ; CHECK-NEXT: s_sub_i32 s0, 0, s4 ; CHECK-NEXT: v_rcp_iflag_f32_e32 v0, v0 @@ -348,7 +346,7 @@ define amdgpu_ps i64 @s_srem_i64(i64 inreg %num, i64 inreg %den) { ; CHECK-NEXT: v_subrev_i32_e32 v1, vcc, s4, v0 ; CHECK-NEXT: v_cmp_le_u32_e32 vcc, s4, v0 ; CHECK-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc -; CHECK-NEXT: .LBB1_5: +; CHECK-NEXT: .LBB1_4: ; CHECK-NEXT: v_readfirstlane_b32 s0, v0 ; CHECK-NEXT: s_mov_b32 s1, s0 ; CHECK-NEXT: ; return to shader part epilog diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/udiv.i64.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/udiv.i64.ll index f5a901b024ef5..c9a5a92188256 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/udiv.i64.ll +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/udiv.i64.ll @@ -193,6 +193,7 @@ define amdgpu_ps i64 @s_udiv_i64(i64 inreg %num, i64 inreg %den) { ; CHECK-NEXT: v_cmp_ne_u64_e64 vcc, s[4:5], 0 ; CHECK-NEXT: s_mov_b32 s6, 1 ; CHECK-NEXT: v_cvt_f32_u32_e32 v2, s2 +; CHECK-NEXT: ; implicit-def: $vgpr0_vgpr1 ; CHECK-NEXT: s_cbranch_vccz .LBB1_2 ; CHECK-NEXT: ; %bb.1: ; CHECK-NEXT: v_mov_b32_e32 v0, s3 @@ -318,15 +319,12 @@ define amdgpu_ps i64 @s_udiv_i64(i64 inreg %num, i64 inreg %den) { ; CHECK-NEXT: v_cndmask_b32_e32 v0, v9, v5, vcc ; CHECK-NEXT: v_cmp_ne_u32_e32 vcc, 0, v3 ; CHECK-NEXT: v_cndmask_b32_e32 v0, v1, v0, vcc -; CHECK-NEXT: s_branch .LBB1_3 -; CHECK-NEXT: .LBB1_2: -; CHECK-NEXT: ; implicit-def: $vgpr0_vgpr1 -; CHECK-NEXT: .LBB1_3: ; %Flow +; CHECK-NEXT: .LBB1_2: ; %Flow ; CHECK-NEXT: s_xor_b32 s1, s6, 1 ; CHECK-NEXT: s_and_b32 s1, s1, 1 ; CHECK-NEXT: s_cmp_lg_u32 s1, 0 -; CHECK-NEXT: s_cbranch_scc1 .LBB1_5 -; CHECK-NEXT: ; %bb.4: +; CHECK-NEXT: s_cbranch_scc1 .LBB1_4 +; CHECK-NEXT: ; %bb.3: ; CHECK-NEXT: v_rcp_iflag_f32_e32 v0, v2 ; CHECK-NEXT: s_sub_i32 s1, 0, s2 ; CHECK-NEXT: v_mul_f32_e32 v0, 0x4f7ffffe, v0 @@ -345,7 +343,7 @@ define amdgpu_ps i64 @s_udiv_i64(i64 inreg %num, i64 inreg %den) { ; CHECK-NEXT: v_add_i32_e32 v2, vcc, 1, v0 ; CHECK-NEXT: v_cmp_le_u32_e32 vcc, s2, v1 ; CHECK-NEXT: v_cndmask_b32_e32 v0, v0, v2, vcc -; CHECK-NEXT: .LBB1_5: +; CHECK-NEXT: .LBB1_4: ; CHECK-NEXT: v_readfirstlane_b32 s0, v0 ; CHECK-NEXT: s_mov_b32 s1, s0 ; CHECK-NEXT: ; return to shader part epilog diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/urem.i64.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/urem.i64.ll index 2be4b52198b45..06e51387c8f21 100644 --- a/llvm/test/CodeGen/AMDGPU/GlobalISel/urem.i64.ll +++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/urem.i64.ll @@ -190,6 +190,7 @@ define amdgpu_ps i64 @s_urem_i64(i64 inreg %num, i64 inreg %den) { ; CHECK-NEXT: v_cmp_ne_u64_e64 vcc, s[4:5], 0 ; CHECK-NEXT: s_mov_b32 s6, 1 ; CHECK-NEXT: v_cvt_f32_u32_e32 v2, s2 +; CHECK-NEXT: ; implicit-def: $vgpr0_vgpr1 ; CHECK-NEXT: s_cbranch_vccz .LBB1_2 ; CHECK-NEXT: ; %bb.1: ; CHECK-NEXT: v_mov_b32_e32 v0, s3 @@ -314,15 +315,12 @@ define amdgpu_ps i64 @s_urem_i64(i64 inreg %num, i64 inreg %den) { ; CHECK-NEXT: v_cndmask_b32_e32 v0, v3, v6, vcc ; CHECK-NEXT: v_cmp_ne_u32_e32 vcc, 0, v1 ; CHECK-NEXT: v_cndmask_b32_e32 v0, v4, v0, vcc -; CHECK-NEXT: s_branch .LBB1_3 -; CHECK-NEXT: .LBB1_2: -; CHECK-NEXT: ; implicit-def: $vgpr0_vgpr1 -; CHECK-NEXT: .LBB1_3: ; %Flow +; CHECK-NEXT: .LBB1_2: ; %Flow ; CHECK-NEXT: s_xor_b32 s1, s6, 1 ; CHECK-NEXT: s_and_b32 s1, s1, 1 ; CHECK-NEXT: s_cmp_lg_u32 s1, 0 -; CHECK-NEXT: s_cbranch_scc1 .LBB1_5 -; CHECK-NEXT: ; %bb.4: +; CHECK-NEXT: s_cbranch_scc1 .LBB1_4 +; CHECK-NEXT: ; %bb.3: ; CHECK-NEXT: v_rcp_iflag_f32_e32 v0, v2 ; CHECK-NEXT: s_sub_i32 s1, 0, s2 ; CHECK-NEXT: v_mul_f32_e32 v0, 0x4f7ffffe, v0 @@ -339,7 +337,7 @@ define amdgpu_ps i64 @s_urem_i64(i64 inreg %num, i64 inreg %den) { ; CHECK-NEXT: v_subrev_i32_e32 v1, vcc, s2, v0 ; CHECK-NEXT: v_cmp_le_u32_e32 vcc, s2, v0 ; CHECK-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc -; CHECK-NEXT: .LBB1_5: +; CHECK-NEXT: .LBB1_4: ; CHECK-NEXT: v_readfirstlane_b32 s0, v0 ; CHECK-NEXT: s_mov_b32 s1, s0 ; CHECK-NEXT: ; return to shader part epilog diff --git a/llvm/test/CodeGen/AMDGPU/artificial-terminators.mir b/llvm/test/CodeGen/AMDGPU/artificial-terminators.mir index 1a76cae68f164..9e84d979e8547 100644 --- a/llvm/test/CodeGen/AMDGPU/artificial-terminators.mir +++ b/llvm/test/CodeGen/AMDGPU/artificial-terminators.mir @@ -34,18 +34,14 @@ body: | ; CHECK-NEXT: S_BRANCH %bb.1 ; CHECK-NEXT: {{ $}} ; CHECK-NEXT: bb.1: - ; CHECK-NEXT: successors: %bb.5(0x30000000), %bb.2(0x50000000) + ; CHECK-NEXT: successors: %bb.4(0x30000000), %bb.2(0x50000000) ; CHECK-NEXT: {{ $}} ; CHECK-NEXT: [[V_CMP_LT_I32_e64_:%[0-9]+]]:sreg_32 = V_CMP_LT_I32_e64 [[V_ADD_U32_e64_3]], [[S_MOV_B32_1]], implicit $exec ; CHECK-NEXT: [[S_XOR_B32_:%[0-9]+]]:sreg_32 = S_XOR_B32 $exec_lo, [[V_CMP_LT_I32_e64_]], implicit-def $scc - ; CHECK-NEXT: $exec_lo = S_MOV_B32_term [[S_XOR_B32_]] - ; CHECK-NEXT: S_CBRANCH_EXECNZ %bb.2, implicit $exec - ; CHECK-NEXT: {{ $}} - ; CHECK-NEXT: bb.5: - ; CHECK-NEXT: successors: %bb.4(0x80000000) - ; CHECK-NEXT: {{ $}} ; CHECK-NEXT: [[COPY3:%[0-9]+]]:sreg_32 = COPY [[V_CMP_LT_I32_e64_]] - ; CHECK-NEXT: S_BRANCH %bb.4 + ; CHECK-NEXT: $exec_lo = S_MOV_B32_term [[S_XOR_B32_]] + ; CHECK-NEXT: S_CBRANCH_EXECZ %bb.4, implicit $exec + ; CHECK-NEXT: S_BRANCH %bb.2 ; CHECK-NEXT: {{ $}} ; CHECK-NEXT: bb.2: ; CHECK-NEXT: successors: %bb.4(0x40000000), %bb.3(0x40000000) @@ -64,7 +60,7 @@ body: | ; CHECK-NEXT: S_BRANCH %bb.4 ; CHECK-NEXT: {{ $}} ; CHECK-NEXT: bb.4: - ; CHECK-NEXT: [[PHI:%[0-9]+]]:sreg_32 = PHI [[COPY3]], %bb.5, [[S_OR_B32_]], %bb.2, [[S_OR_B32_]], %bb.3 + ; CHECK-NEXT: [[PHI:%[0-9]+]]:sreg_32 = PHI [[COPY3]], %bb.1, [[S_OR_B32_]], %bb.2, [[S_OR_B32_]], %bb.3 ; CHECK-NEXT: $exec_lo = S_OR_B32 $exec_lo, [[PHI]], implicit-def $scc ; CHECK-NEXT: S_ENDPGM 0 bb.0: diff --git a/llvm/test/CodeGen/AMDGPU/blender-no-live-segment-at-def-implicit-def.ll b/llvm/test/CodeGen/AMDGPU/blender-no-live-segment-at-def-implicit-def.ll index f9ffa5ae57f3e..dfbb5f6a64042 100644 --- a/llvm/test/CodeGen/AMDGPU/blender-no-live-segment-at-def-implicit-def.ll +++ b/llvm/test/CodeGen/AMDGPU/blender-no-live-segment-at-def-implicit-def.ll @@ -9,44 +9,34 @@ define amdgpu_kernel void @blender_no_live_segment_at_def_error(<4 x float> %ext ; CHECK-NEXT: s_addc_u32 s13, s13, 0 ; CHECK-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s12 ; CHECK-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s13 -; CHECK-NEXT: s_load_dwordx8 s[36:43], s[8:9], 0x0 +; CHECK-NEXT: s_load_dwordx8 s[20:27], s[8:9], 0x0 ; CHECK-NEXT: s_add_u32 s0, s0, s17 ; CHECK-NEXT: s_addc_u32 s1, s1, 0 -; CHECK-NEXT: s_mov_b32 s12, 0 -; CHECK-NEXT: s_waitcnt lgkmcnt(0) -; CHECK-NEXT: s_cmp_lg_u32 s40, 0 -; CHECK-NEXT: s_cbranch_scc1 .LBB0_8 -; CHECK-NEXT: ; %bb.1: ; %if.end13.i.i -; CHECK-NEXT: s_cmp_eq_u32 s42, 0 -; CHECK-NEXT: s_cbranch_scc1 .LBB0_4 -; CHECK-NEXT: ; %bb.2: ; %if.else251.i.i -; CHECK-NEXT: s_cmp_lg_u32 s43, 0 -; CHECK-NEXT: s_mov_b32 s17, 0 -; CHECK-NEXT: s_cselect_b32 s12, -1, 0 -; CHECK-NEXT: s_and_b32 vcc_lo, exec_lo, s12 -; CHECK-NEXT: s_cbranch_vccz .LBB0_5 -; CHECK-NEXT: ; %bb.3: ; CHECK-NEXT: s_mov_b32 s36, 0 -; CHECK-NEXT: s_andn2_b32 vcc_lo, exec_lo, s12 -; CHECK-NEXT: s_cbranch_vccz .LBB0_6 -; CHECK-NEXT: s_branch .LBB0_7 -; CHECK-NEXT: .LBB0_4: -; CHECK-NEXT: s_mov_b32 s14, s12 -; CHECK-NEXT: s_mov_b32 s15, s12 -; CHECK-NEXT: s_mov_b32 s13, s12 -; CHECK-NEXT: s_mov_b64 s[38:39], s[14:15] -; CHECK-NEXT: s_mov_b64 s[36:37], s[12:13] +; CHECK-NEXT: s_waitcnt lgkmcnt(0) +; CHECK-NEXT: s_cmp_lg_u32 s24, 0 +; CHECK-NEXT: s_cbranch_scc0 .LBB0_2 +; CHECK-NEXT: ; %bb.1: +; CHECK-NEXT: s_mov_b64 s[38:39], s[22:23] +; CHECK-NEXT: s_mov_b64 s[36:37], s[20:21] ; CHECK-NEXT: s_branch .LBB0_7 -; CHECK-NEXT: .LBB0_5: ; %if.then263.i.i -; CHECK-NEXT: v_cmp_lt_f32_e64 s12, s41, 0 -; CHECK-NEXT: s_mov_b32 s36, 1.0 -; CHECK-NEXT: s_mov_b32 s17, 0x7fc00000 +; CHECK-NEXT: .LBB0_2: ; %if.end13.i.i ; CHECK-NEXT: s_mov_b32 s37, s36 ; CHECK-NEXT: s_mov_b32 s38, s36 +; CHECK-NEXT: s_cmp_eq_u32 s26, 0 ; CHECK-NEXT: s_mov_b32 s39, s36 +; CHECK-NEXT: s_cbranch_scc1 .LBB0_6 +; CHECK-NEXT: ; %bb.3: ; %if.else251.i.i +; CHECK-NEXT: s_cmp_lg_u32 s27, 0 +; CHECK-NEXT: s_mov_b32 s17, 0 +; CHECK-NEXT: s_cselect_b32 s12, -1, 0 +; CHECK-NEXT: s_and_b32 vcc_lo, exec_lo, s12 +; CHECK-NEXT: s_cbranch_vccz .LBB0_8 +; CHECK-NEXT: ; %bb.4: +; CHECK-NEXT: s_mov_b32 s36, 0 ; CHECK-NEXT: s_andn2_b32 vcc_lo, exec_lo, s12 -; CHECK-NEXT: s_cbranch_vccnz .LBB0_7 -; CHECK-NEXT: .LBB0_6: ; %if.end273.i.i +; CHECK-NEXT: s_cbranch_vccnz .LBB0_6 +; CHECK-NEXT: .LBB0_5: ; %if.end273.i.i ; CHECK-NEXT: s_add_u32 s12, s8, 40 ; CHECK-NEXT: s_addc_u32 s13, s9, 0 ; CHECK-NEXT: s_getpc_b64 s[18:19] @@ -72,13 +62,13 @@ define amdgpu_kernel void @blender_no_live_segment_at_def_error(<4 x float> %ext ; CHECK-NEXT: s_mov_b32 s37, s36 ; CHECK-NEXT: s_mov_b32 s38, s36 ; CHECK-NEXT: s_mov_b32 s39, s36 -; CHECK-NEXT: .LBB0_7: ; %if.end294.i.i +; CHECK-NEXT: .LBB0_6: ; %if.end294.i.i ; CHECK-NEXT: v_mov_b32_e32 v0, 0 ; CHECK-NEXT: buffer_store_dword v0, off, s[0:3], 0 offset:12 ; CHECK-NEXT: buffer_store_dword v0, off, s[0:3], 0 offset:8 ; CHECK-NEXT: buffer_store_dword v0, off, s[0:3], 0 offset:4 ; CHECK-NEXT: buffer_store_dword v0, off, s[0:3], 0 -; CHECK-NEXT: .LBB0_8: ; %kernel_direct_lighting.exit +; CHECK-NEXT: .LBB0_7: ; %kernel_direct_lighting.exit ; CHECK-NEXT: s_load_dwordx2 s[4:5], s[8:9], 0x20 ; CHECK-NEXT: v_mov_b32_e32 v0, s36 ; CHECK-NEXT: v_mov_b32_e32 v4, 0 @@ -88,6 +78,16 @@ define amdgpu_kernel void @blender_no_live_segment_at_def_error(<4 x float> %ext ; CHECK-NEXT: s_waitcnt lgkmcnt(0) ; CHECK-NEXT: global_store_dwordx4 v4, v[0:3], s[4:5] ; CHECK-NEXT: s_endpgm +; CHECK-NEXT: .LBB0_8: ; %if.then263.i.i +; CHECK-NEXT: v_cmp_lt_f32_e64 s12, s25, 0 +; CHECK-NEXT: s_mov_b32 s36, 1.0 +; CHECK-NEXT: s_mov_b32 s17, 0x7fc00000 +; CHECK-NEXT: s_mov_b32 s37, s36 +; CHECK-NEXT: s_mov_b32 s38, s36 +; CHECK-NEXT: s_mov_b32 s39, s36 +; CHECK-NEXT: s_andn2_b32 vcc_lo, exec_lo, s12 +; CHECK-NEXT: s_cbranch_vccz .LBB0_5 +; CHECK-NEXT: s_branch .LBB0_6 entry: %cmp5.i.i = icmp eq i32 %cmp5.i.i.arg, 0 br i1 %cmp5.i.i, label %if.end13.i.i, label %kernel_direct_lighting.exit diff --git a/llvm/test/CodeGen/AMDGPU/dynamic_stackalloc.ll b/llvm/test/CodeGen/AMDGPU/dynamic_stackalloc.ll index d61c4b46596c0..ce0b79b0b358c 100644 --- a/llvm/test/CodeGen/AMDGPU/dynamic_stackalloc.ll +++ b/llvm/test/CodeGen/AMDGPU/dynamic_stackalloc.ll @@ -848,12 +848,13 @@ define amdgpu_kernel void @test_dynamic_stackalloc_kernel_control_flow(i32 %n, i ; GFX9-SDAG-NEXT: s_load_dwordx2 s[4:5], s[8:9], 0x0 ; GFX9-SDAG-NEXT: s_add_u32 s0, s0, s17 ; GFX9-SDAG-NEXT: s_addc_u32 s1, s1, 0 +; GFX9-SDAG-NEXT: s_mov_b64 s[6:7], -1 ; GFX9-SDAG-NEXT: s_mov_b32 s33, 0 -; GFX9-SDAG-NEXT: s_movk_i32 s32, 0x1000 ; GFX9-SDAG-NEXT: s_waitcnt lgkmcnt(0) ; GFX9-SDAG-NEXT: s_cmp_lg_u32 s4, 0 ; GFX9-SDAG-NEXT: s_mov_b32 s4, 0 -; GFX9-SDAG-NEXT: s_cbranch_scc0 .LBB7_6 +; GFX9-SDAG-NEXT: s_movk_i32 s32, 0x1000 +; GFX9-SDAG-NEXT: s_cbranch_scc0 .LBB7_4 ; GFX9-SDAG-NEXT: ; %bb.1: ; %bb.1 ; GFX9-SDAG-NEXT: v_lshl_add_u32 v0, v0, 2, 15 ; GFX9-SDAG-NEXT: v_and_b32_e32 v0, 0x1ff0, v0 @@ -873,8 +874,11 @@ define amdgpu_kernel void @test_dynamic_stackalloc_kernel_control_flow(i32 %n, i ; GFX9-SDAG-NEXT: v_mov_b32_e32 v0, 1 ; GFX9-SDAG-NEXT: buffer_store_dword v0, off, s[0:3], s6 ; GFX9-SDAG-NEXT: s_waitcnt vmcnt(0) -; GFX9-SDAG-NEXT: s_cbranch_execnz .LB... [truncated] `````````` </details> https://github.com/llvm/llvm-project/pull/127666 _______________________________________________ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits