Hi! As discussed in the PR, when combine combines something across a setter of a likely spilled non-fixed hard register, we may get RA ICE, as the combined insn might need for reload a hard reg that is live at the point where combiner has combined the insn.
The following patch attempts to fix that by not creating LOG_LINKS in certain cases. There are two cases which the patch handles separately: 1) CALL_INSN and preceeding load_register_parameters sequence (from the call insn up to the earliest insn in the sequence that sets some likely spilled non-fixed hard register 2) other setters of likely spilled non-fixed hard register (loading of return value of a function into the return register(s), explicit register vars, ...) 1) is handled specially, by disallowing LOG_LINKS from before the first insn into the sequence, because in between that first insn and the CALL_INSN a likely spilled hard register is live. LOG_LINKS from before the sequence to after the CALL_INSN or from within the sequence to after the call or from within the sequence to within the sequence are allowed 2) other setters are handled just like a basic block boundary for LOG_LINKS, no LOG_LINKS are allowed to cross that boundary Bootstrapped/regtested on x86_64-linux and i686-linux. For i686-linux stage3 + target libraries, just 8 *.o files are affected by the patch, for x86_64-linux approx. 8% of *.o files are affected by the patch, though haven't checked in detail how much it affects each file, so far only looked at x86_64-linux builtins.o, where the patch pessimized the code in a single spot: - movq (%rax,%rdx), %rdi + addq %rax, %rdx + movq (%rdx), %rdi call _ZL12validate_argPK9tree_node9tree_code which comes from insn 112 not being combined with 114, because both si and di are on x86_64 likely spilled non-fixed hard registers and thus we disallowed LOG_LINKS from before insn 113 into the 113..114 range, because of the fear that the combined pattern might during reload need the hard register that is live across it: (insn 112 111 113 17 (parallel [ (set (reg/f:DI 134) (plus:DI (reg:DI 132) (reg/v:DI 113 [ off ]))) (clobber (reg:CC 17 flags)) ]) ../../gcc/gimple.h:2066 266 {*adddi_1} (expr_list:REG_DEAD (reg:DI 132) (expr_list:REG_DEAD (reg/v:DI 113 [ off ]) (expr_list:REG_UNUSED (reg:CC 17 flags) (nil))))) (insn 113 112 114 17 (set (reg:SI 4 si) (reg:SI 94 [ D.97472 ])) ../../gcc/builtins.c:11417 90 {*movsi_internal} (expr_list:REG_DEAD (reg:SI 94 [ D.97472 ]) (nil))) (insn 114 113 115 17 (set (reg:DI 5 di) (mem/f:DI (reg/f:DI 134) [8 *_43+0 S8 A64])) ../../gcc/builtins.c:11417 89 {*movdi_internal} (expr_list:REG_DEAD (reg/f:DI 134) (nil))) (call_insn/c/i 115 114 116 17 (set (reg:QI 0 ax) (call (mem:QI (symbol_ref:DI ("_ZL12validate_argPK9tree_node9tree_code") [flags 0x3] <function_decl 0x7f4045caf200 validate_arg>) [0 validate_arg S1 A8]) (const_int 0 [0]))) ../../gcc/builtins.c:11417 680 {*call_value} (expr_list:REG_DEAD (reg:DI 5 di) (expr_list:REG_DEAD (reg:SI 4 si) (expr_list:REG_EH_REGION (const_int 0 [0]) (nil)))) (expr_list:REG_FRAME_RELATED_EXPR (use (reg:DI 5 di)) (expr_list:REG_BR_PRED (use (reg:SI 4 si)) (nil)))) Thoughts on this? I can surely investigate more tomorrow, look at more random object files that differ and see how many changes it creates. Unfortunately, because the patch works at LOG_LINKS creation time, it is hard to gather statistics during compilation, because while lots of LOG_LINKS will be affected, only much smaller actual combinations will not be successfully performed because of the missing LOG_LINKS. Typically the call argument load sequence contains just (set (reg:.. hard reg) (reg:.. pseudo)) insns and we wouldn't combine into that anyway. The patch removes the likely_spilled_retval stuff Joern wrote a few years ago because I believe this change should obsolete that. But, have tried to reproduce the ICE using sh-elf target and haven't managed to do so, 4.9 is quite a different compiler from 2005-ish gcc. 2014-01-15 Jakub Jelinek <ja...@redhat.com> PR rtl-optimization/59477 * combine.c (create_log_links): For sequences of insns loading hard register parameters for a call starting with load of some non-fixed likely spilled hard register don't record LOG_LINKS from before the first load to any insn in between the first load and the call insn. For setters of non-fixed likely spilled hard registers other than in calls and the above sequences disallow recoding of any LOG_LINKS crossing such insns. (struct likely_spilled_retval_info): Remove. (likely_spilled_retval_1, likely_spilled_retval): Remove. (try_combine): Don't call likely_spilled_retval. * g++.dg/torture/pr59477.C: New test. --- gcc/combine.c.jj 2014-01-15 08:11:22.636430002 +0100 +++ gcc/combine.c 2014-01-15 15:45:01.588819713 +0100 @@ -985,20 +985,54 @@ create_log_links (void) basic_block bb; rtx *next_use, insn; df_ref *def_vec, *use_vec; + int *undo_list; next_use = XCNEWVEC (rtx, max_reg_num ()); + undo_list = XCNEWVEC (int, max_reg_num ()); /* Pass through each block from the end, recording the uses of each register and establishing log links when def is encountered. Note that we do not clear next_use array in order to save time, so we have to test whether the use is in the same basic block as def. + Avoid creating log links across instructions that set hard registers + that are likely to be spilled, otherwise reload might die if some + instruction that might need some particular hard register is combined + into the area where that hard register is already live. Usually + likely spilled non-fixed hard registers should occur in the IL + for function call parameter passing and call return value, return + value of the whole function and then for explicit register variables. + + For calls expansion in calls.c usually emits instructions setting the + hard registers immediately before the call, with simple rhs. For this + case the code below attempts to identify the block of insns between the + first insn loading parameter into a hard register where the hard register + is likely spilled and last insn loading parameter into (any) hard + register (call_block_start and call_block_end). We allow adding log links + to instructions in that range only from instructions in that range, when + in the backward walk we reach the call_block_start insn, we remove all + next_use elements for insns in that block, using undo_list array + and undo_list_head. That way it is e.g. possible to have log links + from before a call argument setup sequence to insns after the call. + In the call_block_start .. call_block_end range we don't set + likely_spilled_setter_luid though. + + For all other setters of likely spilled non-fixed hard registers, we just + disallow any log links across those insns, by setting + likely_spilled_setter_luid and not creating any log links pointing to + insns with higher luid than that. + There are a few cases below when we do not consider the definition or usage -- these are taken from original flow.c did. Don't ask me why it is done this way; I don't know and if it works, I don't want to know. */ FOR_EACH_BB_FN (bb, cfun) { + int likely_spilled_setter_luid = -1; + int call_block_start = -1; + int call_block_end = -1; + int undo_list_head = 0; + FOR_BB_INSNS_REVERSE (bb, insn) { if (!NONDEBUG_INSN_P (insn)) @@ -1007,12 +1041,86 @@ create_log_links (void) /* Log links are created only once. */ gcc_assert (!LOG_LINKS (insn)); + int luid = DF_INSN_LUID (insn); + + if (CALL_P (insn)) + { + rtx last_reg = NULL_RTX; + gcc_assert (call_block_start == -1 && call_block_end == -1); + for (rtx link = CALL_INSN_FUNCTION_USAGE (insn); + link != NULL_RTX; + link = XEXP (link, 1)) + if (GET_CODE (XEXP (link, 0)) == USE + && REG_P (XEXP (XEXP (link, 0), 0)) + && HARD_REGISTER_P (XEXP (XEXP (link, 0), 0))) + { + rtx reg = XEXP (XEXP (link, 0), 0); + int nregs = hard_regno_nregs[REGNO (reg)][GET_MODE (reg)]; + while (nregs-- > 0) + if (! TEST_HARD_REG_BIT (fixed_reg_set, + REGNO (reg) + nregs) + && targetm.class_likely_spilled_p + (REGNO_REG_CLASS (REGNO (reg) + nregs))) + break; + if (nregs >= 0) + last_reg = reg; + } + if (last_reg) + { + for (rtx prev = PREV_INSN (insn); + prev; prev = PREV_INSN (prev)) + { + if (!NONDEBUG_INSN_P (prev)) + continue; + if (BLOCK_FOR_INSN (prev) != bb + || !NONJUMP_INSN_P (prev)) + break; + rtx set = single_set (prev); + if (! set) + break; + rtx dest = SET_DEST (set); + if (GET_CODE (dest) == SUBREG) + dest = SUBREG_REG (dest); + if (!REG_P (dest) || !HARD_REGISTER_P (dest)) + break; + if (reg_overlap_mentioned_p (dest, last_reg)) + { + if (call_block_end != -1) + call_block_start = DF_INSN_LUID (prev); + break; + } + else if (call_block_end == -1) + call_block_end = DF_INSN_LUID (prev); + } + if (call_block_start == -1) + call_block_end = -1; + } + } + for (def_vec = DF_INSN_DEFS (insn); *def_vec; def_vec++) { df_ref def = *def_vec; int regno = DF_REF_REGNO (def); rtx use_insn; + if (HARD_REGISTER_NUM_P (regno) + && !CALL_P (insn) + && !(luid >= call_block_start && luid <= call_block_end) + && !(DF_REF_FLAGS (def) + & (DF_REF_MUST_CLOBBER | DF_REF_MAY_CLOBBER))) + { + enum machine_mode mode = GET_MODE (DF_REF_REAL_REG (def)); + int nregs = hard_regno_nregs[regno][mode]; + while (nregs-- > 0) + if (! TEST_HARD_REG_BIT (fixed_reg_set, regno + nregs) + && targetm.class_likely_spilled_p + (REGNO_REG_CLASS (regno + nregs))) + { + likely_spilled_setter_luid = luid; + break; + } + } + if (!next_use[regno]) continue; @@ -1034,7 +1142,10 @@ create_log_links (void) continue; use_insn = next_use[regno]; - if (BLOCK_FOR_INSN (use_insn) == bb) + if (BLOCK_FOR_INSN (use_insn) == bb + && (likely_spilled_setter_luid == -1 + || (DF_INSN_LUID (use_insn) + <= likely_spilled_setter_luid))) { /* flow.c claimed: @@ -1060,6 +1171,33 @@ create_log_links (void) next_use[regno] = NULL_RTX; } + if (luid == call_block_start) + { + /* At the start of the call argument block, remove + all next_use array values pointing to instructions + within the block. The undo_list array normally contains + all zeros, non-zero values form a chain of what needs + to be cleared in next_use array. Value of -1 in the array + means tail of the chain, other non-zero values are next + regno that need to be cleared plus 1 (as zero is a valid + register number). */ + if (undo_list_head) + { + int r = undo_list_head; + do + { + int regno = r - 1; + r = undo_list[regno]; + undo_list[regno] = 0; + next_use[regno] = NULL_RTX; + } + while (r != -1); + undo_list_head = 0; + } + call_block_start = -1; + call_block_end = -1; + } + for (use_vec = DF_INSN_USES (insn); *use_vec; use_vec++) { df_ref use = *use_vec; @@ -1070,11 +1208,23 @@ create_log_links (void) if (DF_REF_FLAGS (use) & DF_REF_CALL_STACK_USAGE) continue; + if (luid >= call_block_start + && luid <= call_block_end + && undo_list[regno] == 0) + { + if (undo_list_head == 0) + undo_list[regno] = -1; + else + undo_list[regno] = undo_list_head; + undo_list_head = regno + 1; + } + next_use[regno] = insn; } } } + free (undo_list); free (next_use); } @@ -2222,87 +2372,6 @@ cant_combine_insn_p (rtx insn) return 0; } -struct likely_spilled_retval_info -{ - unsigned regno, nregs; - unsigned mask; -}; - -/* Called via note_stores by likely_spilled_retval_p. Remove from info->mask - hard registers that are known to be written to / clobbered in full. */ -static void -likely_spilled_retval_1 (rtx x, const_rtx set, void *data) -{ - struct likely_spilled_retval_info *const info = - (struct likely_spilled_retval_info *) data; - unsigned regno, nregs; - unsigned new_mask; - - if (!REG_P (XEXP (set, 0))) - return; - regno = REGNO (x); - if (regno >= info->regno + info->nregs) - return; - nregs = hard_regno_nregs[regno][GET_MODE (x)]; - if (regno + nregs <= info->regno) - return; - new_mask = (2U << (nregs - 1)) - 1; - if (regno < info->regno) - new_mask >>= info->regno - regno; - else - new_mask <<= regno - info->regno; - info->mask &= ~new_mask; -} - -/* Return nonzero iff part of the return value is live during INSN, and - it is likely spilled. This can happen when more than one insn is needed - to copy the return value, e.g. when we consider to combine into the - second copy insn for a complex value. */ - -static int -likely_spilled_retval_p (rtx insn) -{ - rtx use = BB_END (this_basic_block); - rtx reg, p; - unsigned regno, nregs; - /* We assume here that no machine mode needs more than - 32 hard registers when the value overlaps with a register - for which TARGET_FUNCTION_VALUE_REGNO_P is true. */ - unsigned mask; - struct likely_spilled_retval_info info; - - if (!NONJUMP_INSN_P (use) || GET_CODE (PATTERN (use)) != USE || insn == use) - return 0; - reg = XEXP (PATTERN (use), 0); - if (!REG_P (reg) || !targetm.calls.function_value_regno_p (REGNO (reg))) - return 0; - regno = REGNO (reg); - nregs = hard_regno_nregs[regno][GET_MODE (reg)]; - if (nregs == 1) - return 0; - mask = (2U << (nregs - 1)) - 1; - - /* Disregard parts of the return value that are set later. */ - info.regno = regno; - info.nregs = nregs; - info.mask = mask; - for (p = PREV_INSN (use); info.mask && p != insn; p = PREV_INSN (p)) - if (INSN_P (p)) - note_stores (PATTERN (p), likely_spilled_retval_1, &info); - mask = info.mask; - - /* Check if any of the (probably) live return value registers is - likely spilled. */ - nregs --; - do - { - if ((mask & 1 << nregs) - && targetm.class_likely_spilled_p (REGNO_REG_CLASS (regno + nregs))) - return 1; - } while (nregs--); - return 0; -} - /* Adjust INSN after we made a change to its destination. Changing the destination can invalidate notes that say something about @@ -2512,8 +2581,7 @@ try_combine (rtx i3, rtx i2, rtx i1, rtx if (cant_combine_insn_p (i3) || cant_combine_insn_p (i2) || (i1 && cant_combine_insn_p (i1)) - || (i0 && cant_combine_insn_p (i0)) - || likely_spilled_retval_p (i3)) + || (i0 && cant_combine_insn_p (i0))) return 0; combine_attempts++; --- gcc/testsuite/g++.dg/torture/pr59477.C.jj 2014-01-15 09:38:18.348864654 +0100 +++ gcc/testsuite/g++.dg/torture/pr59477.C 2014-01-15 09:38:18.348864654 +0100 @@ -0,0 +1,25 @@ +// PR rtl-optimization/59477 +// { dg-do compile } + +struct A +{ + unsigned *a; + unsigned long b; + A (unsigned long x) : a (), b (x) {} +}; + +struct B +{ + B (int); + B (const B &) {} +}; + +B bar (B, B, A); +int v; + +void +foo () +{ + B c = 0; + bar (c, c, A (1ULL << v)); +} Jakub