The fix for PR117868. In brief: this is an LRA bug derived from reuse spilling slots after frame pointer spilling. The slot was created for QImode (1 byte) and it was reused after spilling of the frame pointer for TImode register (16 bytes long) and it overlaps other slots.
Wrong things happened while `lra_spill ()' ---------------------------- part of lra-spills.cc ---------------------------- n = assign_spill_hard_regs (pseudo_regnos, n); slots_num = 0; assign_stack_slot_num_and_sort_pseudos (pseudo_regnos, n); <--- first call --- for (i = 0; i < n; i++) if (pseudo_slots[pseudo_regnos[i]].mem == NULL_RTX) assign_mem_slot (pseudo_regnos[i]); if ((n2 = lra_update_fp2sp_elimination (pseudo_regnos)) > 0) { /* Assign stack slots to spilled pseudos assigned to fp. */ assign_stack_slot_num_and_sort_pseudos (pseudo_regnos, n2); <--- second call --- for (i = 0; i < n2; i++) if (pseudo_slots[pseudo_regnos[i]].mem == NULL_RTX) assign_mem_slot (pseudo_regnos[i]); } ------------------------------------------------------------------------------ In a first call of `assign_stack_slot_num_and_sort_pseudos(...)' LRA allocates slot #17 for r93 (QImode - 1 byte). In a second call of `assign_stack_slot_num_and_sort_pseudos(...)' LRA reuse slot #17 for r114 (TImode - 16 bytes). It's wrong. We can't reuse 1 byte slot #17 for 16 bytes register. Details: After IRA pass we have: ------------------ part of simd-t.c.319r.ira ---------------------- (insn 269 268 270 8 (set (subreg:QI (reg:TI 114 [ _116 ]) 6) (xor:QI (reg:QI 66 [ _36 ]) (reg:QI 67 [ _37 ]))) 631 {xorqi3} (expr_list:REG_DEAD (reg:QI 67 [ _37 ]) (expr_list:REG_DEAD (reg:QI 66 [ _36 ]) (nil)))) (insn 270 269 271 8 (set (subreg:QI (reg:TI 114 [ _116 ]) 7) (xor:QI (reg:QI 69 [ _39 ]) (reg:QI 70 [ _40 ]))) 631 {xorqi3} (expr_list:REG_DEAD (reg:QI 70 [ _40 ]) (expr_list:REG_DEAD (reg:QI 69 [ _39 ]) (nil)))) ------------------------------------------------------------------- While LRA spilling: ------------------ part of simd-t.c.320r.reload ------------------- Creating newreg=348 from oldreg=66, assigning class GENERAL_REGS to r348 269: r348:QI=r348:QI^r67:QI REG_DEAD r67:QI REG_DEAD r66:QI Inserting insn reload before: 543: r348:QI=r66:QI Inserting insn reload after: 544: r114:TI#6=r348:QI [...] Choosing alt 0 in insn 270: (0) =r (1) %0 (2) r {xorqi3} Creating newreg=349 from oldreg=69, assigning class GENERAL_REGS to r349 Creating newreg=350 from oldreg=70, assigning class GENERAL_REGS to r350 270: r349:QI=r349:QI^r350:QI REG_DEAD r70:QI REG_DEAD r69:QI Inserting insn reload before: 545: r349:QI=r69:QI 547: r350:QI=r70:QI Inserting insn reload after: 546: r114:TI#7=r349:QI ------------------------------------------------------------------- After LRA pass: ------------------ part of simd-t.c.320r.reload ------------------- (insn 543 542 269 11 (set (reg:QI 14 r14 [orig:66 _36 ] [66]) (mem/c:QI (plus:HI (reg/f:HI 28 r28) (const_int 7 [0x7])) [4 %sfp+7 S1 A8])) 113 {movqi_insn_split} (nil)) (insn 269 543 544 11 (set (reg:QI 14 r14 [orig:66 _36 ] [66]) (xor:QI (reg:QI 14 r14 [orig:66 _36 ] [66]) (reg:QI 2 r2 [orig:67 _37 ] [67]))) 631 {xorqi3} (nil)) (insn 544 269 545 11 (set (mem/c:QI (plus:HI (reg/f:HI 28 r28) (const_int 24 [0x18])) [4 %sfp+24 S1 A8]) (reg:QI 14 r14 [orig:66 _36 ] [66])) 113 {movqi_insn_split} (nil)) (insn 545 544 547 11 (set (reg:QI 14 r14 [orig:69 _39 ] [69]) (mem/c:QI (plus:HI (reg/f:HI 28 r28) (const_int 24 [0x18])) [4 %sfp+24 S1 A8])) 113 {movqi_insn_split} (nil)) (insn 547 545 270 11 (set (reg:QI 13 r13 [orig:70 _40 ] [70]) (mem/c:QI (plus:HI (reg/f:HI 28 r28) (const_int 8 [0x8])) [4 %sfp+8 S1 A8])) 113 {movqi_insn_split} (nil)) (insn 270 547 546 11 (set (reg:QI 14 r14 [orig:69 _39 ] [69]) (xor:QI (reg:QI 14 r14 [orig:69 _39 ] [69]) (reg:QI 13 r13 [orig:70 _40 ] [70]))) 631 {xorqi3} (nil)) (insn 546 270 548 11 (set (mem/c:QI (plus:HI (reg/f:HI 28 r28) (const_int 25 [0x19])) [4 %sfp+25 S1 A8]) (reg:QI 14 r14 [orig:69 _39 ] [69])) 113 {movqi_insn_split} (nil)) ------------------------------------------------------------------- Simplified insns before LRA: ---------------------------------- insn 269 r114.6 = r66 ^ r67 insn 270 r114.7 = r69 ^ r70 ---------------------------------- after LRA: ---------------------------------- insn 543 r14 {r66} = [sfp+7] # reload insn insn 269 r14 {r66} ^= r2 {r67} insn 544 [sfp+24] = r14 {r66} # reload insn -- bug is here ! insn 545 r14 {r69} = [sfp+24] # reload insn insn 547 r13 {r70} = [sfp+8] # reload insn insn 270 r14 {r69} ^= r13 {r70} insn 546 [sfp+25] = r14 {r69} # reload insn ---------------------------------- The bug appears in insn 544. It is a spill address `[sfp+24]' for pseudo r66 which is equal to another slot address from insn 544 for pseudo r69. The problem is here: ------------------ part of simd-t.c.320r.reload ------------------- Slot 17 regnos (width = 0): 93 114 ------------------------------------------------------------------- Where: r93 is a QImode register r114 is a TImode register (occupies 16 registers on AVR target !) It's not a problem between r93 and r114 (live ranges of r93 and r114 didn't intersect). There is an issue with other spill slots because r114 is a large register that overlaps with others. Wrong things happened while `lra_spill ()': ---------------------------- part of lra-spills.cc ---------------------------- n = assign_spill_hard_regs (pseudo_regnos, n); slots_num = 0; assign_stack_slot_num_and_sort_pseudos (pseudo_regnos, n); <--- first call --- for (i = 0; i < n; i++) if (pseudo_slots[pseudo_regnos[i]].mem == NULL_RTX) assign_mem_slot (pseudo_regnos[i]); if ((n2 = lra_update_fp2sp_elimination (pseudo_regnos)) > 0) { /* Assign stack slots to spilled pseudos assigned to fp. */ assign_stack_slot_num_and_sort_pseudos (pseudo_regnos, n2); <--- second call --- for (i = 0; i < n2; i++) if (pseudo_slots[pseudo_regnos[i]].mem == NULL_RTX) assign_mem_slot (pseudo_regnos[i]); } ------------------------------------------------------------------------------ In a first call of `assign_stack_slot_num_and_sort_pseudos(...)' LRA allocates slot #17 for r93 (QImode - 1 byte). In a second call of `assign_stack_slot_num_and_sort_pseudos(...)' LRA reuse slot #17 for r114 (TImode - 16 bytes). It's wrong. We can't reuse 1 byte slot #17 for 16 bytes register. The code in patch does reuse slots only without allocated memory or only with equal or smaller registers with equal or smaller alignment. Also, a small fix for debugging output of slot width. Print slot size as width, not a 0 as a size of (mem/c:BLK (...)). On x86_64, it bootstraps+regtests fine. Ok for trunk ? Denis. PR rtl-optimization/117868 gcc/ * lra-spills.cc (assign_stack_slot_num_and_sort_pseudos): Reuse slots only without allocated memory or only with equal or smaller registers with equal or smaller alignment. (lra_spill): Print slot size as width. diff --git a/gcc/lra-spills.cc b/gcc/lra-spills.cc index db78dcd28a3..93a0c92db9f 100644 --- a/gcc/lra-spills.cc +++ b/gcc/lra-spills.cc @@ -386,7 +386,18 @@ assign_stack_slot_num_and_sort_pseudos (int *pseudo_regnos, int n) && ! (lra_intersected_live_ranges_p (slots[j].live_ranges, lra_reg_info[regno].live_ranges))) - break; + { + /* A slot without allocated memory can be shared. */ + if (slots[j].mem == NULL_RTX) + break; + + /* A slot with allocated memory can be shared only with equal + or smaller register with equal or smaller alignment. */ + if (slots[j].align >= spill_slot_alignment (mode) + && compare_sizes_for_sort (slots[j].size, + GET_MODE_SIZE (mode)) != -1) + break; + } } if (j >= slots_num) { @@ -656,8 +667,7 @@ lra_spill (void) for (i = 0; i < slots_num; i++) { fprintf (lra_dump_file, " Slot %d regnos (width = ", i); - print_dec (GET_MODE_SIZE (GET_MODE (slots[i].mem)), - lra_dump_file, SIGNED); + print_dec (slots[i].size, lra_dump_file, SIGNED); fprintf (lra_dump_file, "):"); for (curr_regno = slots[i].regno;; curr_regno = pseudo_slots[curr_regno].next - pseudo_slots)