https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113070
--- Comment #7 from Alex Coplan <acoplan at gcc dot gnu.org> --- Just to give a concrete example / reduced testcase where this goes wrong (to aid review). For the following testcase (reduced from libiberty) with -O2 -mlate-ldp-fusion: struct { unsigned D; int E; } * sha1_process_block_ctx; void *sha1_process_block_buffer; int sha1_process_block_ctx_1, sha1_process_block_ctx_0, sha1_process_block_ctx_3, sha1_process_block_d, sha1_process_block_e, sha1_process_block_tm, sha1_process_block_a, sha1_process_block_x_6, sha1_process_block_x_14, sha1_process_block_x_15; unsigned sha1_process_block_ctx_2; void sha1_process_block() { int *words = sha1_process_block_buffer; int endp = *words, x_0; int x[6]; unsigned b, c; while (endp) { int t = 0; for (; t < 6;) t = *words; sha1_process_block_a += sha1_process_block_ctx_2 + 8348 + sha1_process_block_tm; x_0 += sha1_process_block_tm = x[73]; b += sha1_process_block_x_15 = sha1_process_block_tm; sha1_process_block_a += b | 1; sha1_process_block_tm = sha1_process_block_x_14 ^ 8; sha1_process_block_e = sha1_process_block_tm; sha1_process_block_tm = x[8]; c += sha1_process_block_x_14 = sha1_process_block_tm; b += sha1_process_block_x_15; sha1_process_block_tm = x_0 ^ x[3]; sha1_process_block_a += sha1_process_block_tm; sha1_process_block_tm = x[4] ^ x[15]; sha1_process_block_e += sha1_process_block_a + b ^ sha1_process_block_d + sha1_process_block_tm; sha1_process_block_tm = sha1_process_block_x_6 ^ x[15]; sha1_process_block_d += sha1_process_block_e >> 5 + (sha1_process_block_x_6 = sha1_process_block_tm); sha1_process_block_ctx_0 += sha1_process_block_ctx_1 += sha1_process_block_ctx_2 += c; sha1_process_block_ctx_3 += sha1_process_block_ctx->E += sha1_process_block_e; } } we try to do this: fusing pair [L=0] (200,199), base=31, hazards: (27,54), move_range: (54,54) with the initial IR: insn i200 in bb3 [ebb3] at point 102: +--------------------------- | 200: [sp:DI+0x64]=x0:SI | REG_DEAD x0:SI +--------------------------- uses: use of set r0:i37 (x0:SI) use of phi node r31:a12 (sp:DI) appears inside an address defines: set mem:i200 insn i198 in bb3 [ebb3] at point 104: +--------------------------- | 198: [sp:DI+0x6c]=x2:SI | REG_DEAD x2:SI +--------------------------- uses: use of set r2:i81 (x2:SI) use of phi node r31:a12 (sp:DI) appears inside an address defines: set mem:i198 used by insn i27 in bb3 [ebb3] at point 108 insn i54 in bb3 [ebb3] at point 106: +-------------------------- | 54: x2:SI=x16:SI<<0x1 +-------------------------- uses: SI use of set r16:i28 (x16:DI) defines: set r2:i54 (x2:SI) used by insn i199 in bb3 [ebb3] at point 110 insn i27 in bb3 [ebb3] at point 108: +-------------------------------------------- | 27: x0:DI=zero_extend([x1:DI+0x18]) | REG_EQUAL [const(`*.LANCHOR0'+0x18)] +-------------------------------------------- uses: use of set r1:i223 (x1:DI) appears inside an address use of set mem:i198 defines: set r0:i27 (x0:DI) live out from bb3 [ebb3] at point 114 used by phi node r0:a15 (x0:DI) in ebb6 at point 116 insn i199 in bb3 [ebb3] at point 110: +--------------------------- | 199: [sp:DI+0x68]=x2:SI | REG_DEAD x2:SI +--------------------------- uses: use of set r2:i54 (x2:SI) use of phi node r31:a12 (sp:DI) appears inside an address defines: set mem:i199 used by phi node mem:a15 in ebb6 at point 116 as it stands, after fusing that pair, we have: insn i200 in bb3 [ebb3] at point 102: +-------------------------- | 200: clobber [scratch] +-------------------------- defines: set mem:i200 insn i198 in bb3 [ebb3] at point 104: +--------------------------- | 198: [sp:DI+0x6c]=x2:SI | REG_DEAD x2:SI +--------------------------- uses: use of set r2:i81 (x2:SI) use of phi node r31:a12 (sp:DI) appears inside an address defines: set mem:i198 used by insn i27 in bb3 [ebb3] at point 108 insn i54 in bb3 [ebb3] at point 106: +-------------------------- | 54: x2:SI=x16:SI<<0x1 +-------------------------- uses: SI use of set r16:i28 (x16:DI) defines: set r2:i54 (x2:SI) used by insn i244 in bb3 [ebb3] at point 107 insn i244 in bb3 [ebb3] at point 107: +-------------------------------------------- | 244: [sp:DI+0x64]=unspec[x0:SI,x2:SI] 38 +-------------------------------------------- uses: use of set r0:i37 (x0:SI) use of set r2:i54 (x2:SI) use of phi node r31:a12 (sp:DI) appears inside an address defines: set mem:i244 insn i27 in bb3 [ebb3] at point 108: +-------------------------------------------- | 27: x0:DI=zero_extend([x1:DI+0x18]) | REG_EQUAL [const(`*.LANCHOR0'+0x18)] +-------------------------------------------- uses: use of set r1:i223 (x1:DI) appears inside an address use of set mem:i198 defines: set r0:i27 (x0:DI) live out from bb3 [ebb3] at point 114 used by phi node r0:a15 (x0:DI) in ebb6 at point 116 insn i199 in bb3 [ebb3] at point 110: +-------------------------- | 199: clobber [scratch] +-------------------------- defines: set mem:i199 used by phi node mem:a15 in ebb6 at point 116 The use problem is already visible here: i27 is consuming mem from i198, but it should be consuming mem from our newly-inserted stp (i244). The def problem is visible if we look in GDB: (gdb) call debug (i2) insn i199 in bb3 [ebb3] at point 110: +-------------------------- | 199: clobber [scratch] +-------------------------- defines: set mem:i199 used by phi node mem:a15 in ebb6 at point 116 (gdb) call debug (i2->defs ()[0]) set mem:i199 in bb3 [ebb3] at point 110 used by phi node mem:a15 in ebb6 at point 116 (gdb) call debug (i2->defs ()[0]->prev_def ()) set mem:i198 in bb3 [ebb3] at point 104 used by insn i27 in bb3 [ebb3] at point 108 here the previous def should be our new stp (i244) instead of i198. I have patches to fix both of these issues.