https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110237
Bug ID: 110237 Summary: gcc.dg/torture/pr58955-2.c is miscompiled by RTL scheduling after reload Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: rguenth at gcc dot gnu.org Target Milestone: --- When compiling the testcase with fully masked AVX512 vectorization, -march=znver4 --param=vect-partial-vector-usage=2 -fdiagnostics-plain-output -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions RTL sched2 is presented with (insn 38 35 39 3 (set (mem:V16SI (plus:DI (reg:DI 40 r12 [orig:90 _22 ] [90]) (const:DI (plus:DI (symbol_ref:DI ("b") [flags 0x2] <var_decl 0x7fda7413dd80 b>) (const_int -4 [0xfffffffffffffffc])))) [1 MEM <vector(16) int> [(int *)vectp_b.12_28]+0 S64 A32]) (vec_merge:V16SI (reg:V16SI 21 xmm1 [118]) (mem:V16SI (plus:DI (reg:DI 40 r12 [orig:90 _22 ] [90]) (const:DI (plus:DI (symbol_ref:DI ("b") [flags 0x2] <var_decl 0x7fda7413dd80 b>) (const_int -4 [0xfffffffffffffffc])))) [1 MEM <vector(16) int> [(int *)vectp_b.12_28]+0 S64 A32]) (reg:HI 69 k1 [116]))) "/space/rguenther/src/gcc11queue/gcc/testsuite/gcc.dg/torture/pr58955-2.c":12:12 1942 {avx512f_storev16si_mask} (expr_list:REG_DEAD (reg:HI 69 k1 [116]) (expr_list:REG_DEAD (reg:DI 40 r12 [orig:90 _22 ] [90]) (expr_list:REG_DEAD (reg:V16SI 21 xmm1 [118]) (nil))))) ... (insn 41 39 42 3 (set (reg:CCZ 17 flags) (compare:CCZ (mem/c:SI (const:DI (plus:DI (symbol_ref:DI ("b") [flags 0x2] <var_decl 0x7fda7413dd80 b>) (const_int 4 [0x4]))) [1 b[1]+0 S4 A32]) (const_int 1 [0x1]))) "/space/rguenther/src/gcc11queue/gcc/testsuite/gcc.dg/torture/pr58955-2.c":15:6 11 {*cmpsi_1} (nil)) and it moves the masked store across the load of one of the destinations elements: - 32: xmm0:V16QI=vec_duplicate(bx:QI) - REG_DEAD bx:QI - 33: NOTE_INSN_DELETED - 34: k1:HI=unspec[xmm0:V16QI,[`*.LC0'],0x6] 146 - REG_DEAD xmm0:V16QI 36: cx:SI=0x1 REG_EQUIV 0x1 + 41: flags:CCZ=cmp([const(`b'+0x4)],0x1) + 32: xmm0:V16QI=vec_duplicate(bx:QI) + REG_DEAD bx:QI 35: xmm1:V16SI=vec_duplicate(cx:SI) REG_DEAD cx:SI REG_EQUIV const_vector + 34: k1:HI=unspec[xmm0:V16QI,[`*.LC0'],0x6] 146 + REG_DEAD xmm0:V16QI + 39: [`a']=0x2 38: [r12:DI+const(`b'-0x4)]=vec_merge(xmm1:V16SI,[r12:DI+const(`b'-0x4)],k1:HI) REG_DEAD k1:HI REG_DEAD r12:DI REG_DEAD xmm1:V16SI - 39: [`a']=0x2 - 41: flags:CCZ=cmp([const(`b'+0x4)],0x1) the address of the masked store is computed oddly though: 14: r12:DI=dx:DI<<0x2+0x4 REG_DEAD dx:DI and in the end the assembly contains leaq 4(,%rdx,4), %r12 ... cmpl $1, b+4(%rip) ... vmovdqu32 %zmm1, b-4(%r12){%k1} (%rdx is zero)