Hi! The following testcase as well as the PR51924 testcase show that it is a really bad idea to rely on the UD/DU DF links in the ree pass if we keep removing them while the pass goes on, and just assume that the insns feeding the now removed UD links have been properly adjusted and are suitable for the change.
In particular, this testcase (and even smaller: unsigned long baz (unsigned short *s, int i) { unsigned short x = s[i]; if (x < 0x211) return x; asm volatile (""); return (unsigned char) x; } ) is miscompiled because we have a HImode memory load that feeds a HImode comparison, HI->DImode zero extension and a QI->DImode zero extension. Both zero extensions are processed as candidates for removal, the HI->DImode zero extension verifies the HImode memory load has the right mode matching the ZERO_EXTEND operand's mode, and replaces it with a HI->DImode zero extending load and schedules the HI->DImode zero extension non-load insn for removal. As the load has been modified, its links have been invalidated (during apply_changes_group df_rescan_insn). When the QI->DImode zero extension is considered, we see that it has no UD links and think that must mean all the definitions have been suitably adjusted. They have been adjusted, but not suitably (i.e. just HI->DI instead of QI->DI) and schedule the QI->DImode zero extension for removal. This patch fixes this by using DF_DEFER_INSN_RESCAN, so we keep the UD/DU links for the whole duration of the pass, this pass doesn't add new instructions, only changes them (and schedules insns for removal, but that is done as the last step for all to be deleted insns at once). Thus we don't need to assume missing links mean something, we can still walk them all and verify. There is an array recording what insns have been changed and how (indexed by INSN_UID), so that we don't have to guess what changes we've done in the past (we simplify_rtx the zero/sign extensions, adjust const_ints in SET_SRC, etc.). I'm reverting Eric's recent changes (keeping all the cleanups), because it is no longer necessary. This patch fixes this testcase, Eric's PR51924 as well as x86_64 -m64 jcf-parse.c miscompilation (that has been also fixed by Eric's patch already; found this by doing extra verification bootstraps with earlier versions of the patch). In addition to that change the patch decreases the amount of malloc/free calls (with each candidate insn being checked we'd malloc 3 to 4 vectors and freed them shortly afterwards, now we keep the vectors around for the whole pass, and just VEC_truncate them to 0 each time, which is much cheaper). Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? For 4.8, the pass could try harder, e.g. on: int foo (unsigned char *x, unsigned int y) { unsigned short a = x[y]; a = (a << 8) | x[y + 1]; return a; } we don't currently remove the useless movzwl %ax, %eax at the end of the function, because it is fed by a SImode IOR and thus the ext_src_mode check fails (well, we don't have a zero extending HImode -> DImode ior insn anyway). But we could in that case process it similarly to conditional moves (at least for IOR, perhaps others) - by processing both the operands of the IOR instead, there we'd find out that one operand is QI->SImode zero extended already (i.e. fine and adjustable to QI->DImode zero extension at no cost) and the other is QI->SImode zero extended shifted up by 8 (again, could be adjusted on both the load and shift at no cost). 2012-01-23 Jakub Jelinek <ja...@redhat.com> PR rtl-optimization/51933 * ree.c (transform_ifelse): Return true right away if dstreg is already wider or equal to cand->mode. (enum ext_modified_kind, struct ext_modified, ext_state): New types. (make_defs_and_copies_lists): Remove defs_list and copies_list arguments, add state argument, just truncate state->work_list instead of always allocating and freeing the vector. Assert that get_defs succeeds instead of returning 2. Changed return type to bool. (merge_def_and_ext): Add state argument. If SET_DEST doesn't have ext_src_mode, see if it has been modified already with the right kind of extension and has been extended before from the ext_src_mode. If SET_DEST is already wider or equal to cand->mode, just return true. Remember the original mode in state->modified array. (combine_reaching_defs): Add state argument. Don't allocate and free here def_list, copied_list and vec vectors, instead just VEC_truncate the vectors in *state. Don't handle outcome == 2 here. (find_and_remove_re): Set DF_DEFER_INSN_RESCAN df flag. Add state variable, clear vectors in it, initialize state.modified if needed. Free all the vectors at the end and state.modified too. Don't skip a candidate if the extension expression has been modified. * gcc.c-torture/execute/pr51933.c: New test. --- gcc/ree.c.jj 2012-01-22 16:02:10.000000000 +0100 +++ gcc/ree.c 2012-01-22 19:43:04.590621478 +0100 @@ -380,6 +380,11 @@ transform_ifelse (ext_cand *cand, rtx de dstreg = SET_DEST (set_insn); srcreg = XEXP (SET_SRC (set_insn), 1); srcreg2 = XEXP (SET_SRC (set_insn), 2); + /* If the conditional move already has the right or wider mode, + there is nothing to do. */ + if (GET_MODE_SIZE (GET_MODE (dstreg)) >= GET_MODE_SIZE (cand->mode)) + return true; + map_srcreg = gen_rtx_REG (cand->mode, REGNO (srcreg)); map_srcreg2 = gen_rtx_REG (cand->mode, REGNO (srcreg2)); map_dstreg = gen_rtx_REG (cand->mode, REGNO (dstreg)); @@ -466,43 +471,76 @@ is_cond_copy_insn (rtx insn, rtx *reg1, return false; } +enum ext_modified_kind +{ + /* The insn hasn't been modified by ree pass yet. */ + EXT_MODIFIED_NONE, + /* Changed into zero extension. */ + EXT_MODIFIED_ZEXT, + /* Changed into sign extension. */ + EXT_MODIFIED_SEXT +}; + +struct ext_modified +{ + /* Mode from which ree has zero or sign extended the destination. */ + ENUM_BITFIELD(machine_mode) mode : 8; + + /* Kind of modification of the insn. */ + ENUM_BITFIELD(ext_modified_kind) kind : 2; + + /* True if the insn is scheduled to be deleted. */ + unsigned int deleted : 1; +}; + +/* Vectors used by combine_reaching_defs and its helpers. */ +typedef struct ext_state +{ + /* In order to avoid constant VEC_alloc/VEC_free, we keep these + 4 vectors live through the entire find_and_remove_re and just + VEC_truncate them each time. */ + VEC (rtx, heap) *defs_list; + VEC (rtx, heap) *copies_list; + VEC (rtx, heap) *modified_list; + VEC (rtx, heap) *work_list; + + /* For instructions that have been successfully modified, this is + the original mode from which the insn is extending and + kind of extension. */ + struct ext_modified *modified; +} ext_state; + /* Reaching Definitions of the extended register could be conditional copies or regular definitions. This function separates the two types into two - lists, DEFS_LIST and COPIES_LIST. This is necessary because, if a reaching - definition is a conditional copy, merging the extension with this definition - is wrong. Conditional copies are merged by transitively merging their - definitions. The defs_list is populated with all the reaching definitions - of the extension instruction (EXTEND_INSN) which must be merged with an - extension. The copies_list contains all the conditional moves that will - later be extended into a wider mode conditional move if all the merges are - successful. The function returns 0 upon failure, 1 upon success and 2 when - all definitions of the EXTEND_INSN have been previously merged. */ + lists, STATE->DEFS_LIST and STATE->COPIES_LIST. This is necessary because, + if a reaching definition is a conditional copy, merging the extension with + this definition is wrong. Conditional copies are merged by transitively + merging their definitions. The defs_list is populated with all the reaching + definitions of the extension instruction (EXTEND_INSN) which must be merged + with an extension. The copies_list contains all the conditional moves that + will later be extended into a wider mode conditional move if all the merges + are successful. The function returns 0 upon failure, 1 upon success. */ -static int +static bool make_defs_and_copies_lists (rtx extend_insn, rtx set_pat, - VEC (rtx,heap) **defs_list, - VEC (rtx,heap) **copies_list) + ext_state *state) { - VEC (rtx,heap) *work_list = VEC_alloc (rtx, heap, 8); rtx src_reg = XEXP (SET_SRC (set_pat), 0); bool *is_insn_visited; - int ret = 1; + bool ret = true; + + VEC_truncate (rtx, state->work_list, 0); /* Initialize the work list. */ - if (!get_defs (extend_insn, src_reg, &work_list)) - { - VEC_free (rtx, heap, work_list); - /* The number of defs being equal to zero can only mean that all the - definitions have been previously merged. */ - return 2; - } + if (!get_defs (extend_insn, src_reg, &state->work_list)) + gcc_unreachable (); is_insn_visited = XCNEWVEC (bool, max_insn_uid); /* Perform transitive closure for conditional copies. */ - while (!VEC_empty (rtx, work_list)) + while (!VEC_empty (rtx, state->work_list)) { - rtx def_insn = VEC_pop (rtx, work_list); + rtx def_insn = VEC_pop (rtx, state->work_list); rtx reg1, reg2; gcc_assert (INSN_UID (def_insn) < max_insn_uid); @@ -514,22 +552,21 @@ make_defs_and_copies_lists (rtx extend_i if (is_cond_copy_insn (def_insn, ®1, ®2)) { /* Push it onto the copy list first. */ - VEC_safe_push (rtx, heap, *copies_list, def_insn); + VEC_safe_push (rtx, heap, state->copies_list, def_insn); /* Now perform the transitive closure. */ - if (!get_defs (def_insn, reg1, &work_list) - || !get_defs (def_insn, reg2, &work_list)) + if (!get_defs (def_insn, reg1, &state->work_list) + || !get_defs (def_insn, reg2, &state->work_list)) { - ret = 0; + ret = false; break; } } else - VEC_safe_push (rtx, heap, *defs_list, def_insn); + VEC_safe_push (rtx, heap, state->defs_list, def_insn); } XDELETEVEC (is_insn_visited); - VEC_free (rtx, heap, work_list); return ret; } @@ -538,7 +575,7 @@ make_defs_and_copies_lists (rtx extend_i on the SET pattern. */ static bool -merge_def_and_ext (ext_cand *cand, rtx def_insn) +merge_def_and_ext (ext_cand *cand, rtx def_insn, ext_state *state) { enum machine_mode ext_src_mode; enum rtx_code code; @@ -577,10 +614,27 @@ merge_def_and_ext (ext_cand *cand, rtx d gcc_assert (sub_rtx != NULL); - if (GET_CODE (SET_DEST (*sub_rtx)) == REG - && GET_MODE (SET_DEST (*sub_rtx)) == ext_src_mode) - { - return combine_set_extension (cand, def_insn, sub_rtx); + if (REG_P (SET_DEST (*sub_rtx)) + && (GET_MODE (SET_DEST (*sub_rtx)) == ext_src_mode + || ((state->modified[INSN_UID (def_insn)].kind + == (cand->code == ZERO_EXTEND + ? EXT_MODIFIED_ZEXT : EXT_MODIFIED_SEXT)) + && state->modified[INSN_UID (def_insn)].mode + == ext_src_mode))) + { + if (GET_MODE_SIZE (GET_MODE (SET_DEST (*sub_rtx))) + >= GET_MODE_SIZE (cand->mode)) + return true; + /* If def_insn is already scheduled to be deleted, don't attempt + to modify it. */ + if (state->modified[INSN_UID (def_insn)].deleted) + return false; + if (combine_set_extension (cand, def_insn, sub_rtx)) + { + if (state->modified[INSN_UID (def_insn)].kind == EXT_MODIFIED_NONE) + state->modified[INSN_UID (def_insn)].mode = ext_src_mode; + return true; + } } return false; @@ -596,48 +650,31 @@ merge_def_and_ext (ext_cand *cand, rtx d and false upon failure. */ static bool -combine_reaching_defs (ext_cand *cand, rtx set_pat) +combine_reaching_defs (ext_cand *cand, rtx set_pat, ext_state *state) { rtx def_insn; bool merge_successful = true; int i; int defs_ix; - int outcome; - VEC (rtx, heap) *defs_list, *copies_list, *vec; - - defs_list = VEC_alloc (rtx, heap, 8); - copies_list = VEC_alloc (rtx, heap, 8); + bool outcome; - outcome = make_defs_and_copies_lists (cand->insn, - set_pat, &defs_list, &copies_list); + VEC_truncate (rtx, state->defs_list, 0); + VEC_truncate (rtx, state->copies_list, 0); - /* outcome == 2 means that all the definitions for this extension have been - previously merged when handling other extensions. */ - if (outcome == 2) - { - VEC_free (rtx, heap, defs_list); - VEC_free (rtx, heap, copies_list); - if (dump_file) - fprintf (dump_file, "All definitions have been previously merged.\n"); - return true; - } + outcome = make_defs_and_copies_lists (cand->insn, set_pat, state); - if (outcome == 0) - { - VEC_free (rtx, heap, defs_list); - VEC_free (rtx, heap, copies_list); - return false; - } + if (!outcome) + return false; merge_successful = true; /* Go through the defs vector and try to merge all the definitions in this vector. */ - vec = VEC_alloc (rtx, heap, 8); - FOR_EACH_VEC_ELT (rtx, defs_list, defs_ix, def_insn) + VEC_truncate (rtx, state->modified_list, 0); + FOR_EACH_VEC_ELT (rtx, state->defs_list, defs_ix, def_insn) { - if (merge_def_and_ext (cand, def_insn)) - VEC_safe_push (rtx, heap, vec, def_insn); + if (merge_def_and_ext (cand, def_insn, state)) + VEC_safe_push (rtx, heap, state->modified_list, def_insn); else { merge_successful = false; @@ -649,12 +686,10 @@ combine_reaching_defs (ext_cand *cand, r the copies in this vector. */ if (merge_successful) { - FOR_EACH_VEC_ELT (rtx, copies_list, i, def_insn) + FOR_EACH_VEC_ELT (rtx, state->copies_list, i, def_insn) { if (transform_ifelse (cand, def_insn)) - { - VEC_safe_push (rtx, heap, vec, def_insn); - } + VEC_safe_push (rtx, heap, state->modified_list, def_insn); else { merge_successful = false; @@ -675,9 +710,12 @@ combine_reaching_defs (ext_cand *cand, r if (dump_file) fprintf (dump_file, "All merges were successful.\n"); - VEC_free (rtx, heap, vec); - VEC_free (rtx, heap, defs_list); - VEC_free (rtx, heap, copies_list); + FOR_EACH_VEC_ELT (rtx, state->modified_list, i, def_insn) + if (state->modified[INSN_UID (def_insn)].kind == EXT_MODIFIED_NONE) + state->modified[INSN_UID (def_insn)].kind + = (cand->code == ZERO_EXTEND + ? EXT_MODIFIED_ZEXT : EXT_MODIFIED_SEXT); + return true; } else @@ -689,7 +727,7 @@ combine_reaching_defs (ext_cand *cand, r { fprintf (dump_file, "Merge cancelled, non-mergeable definitions:\n"); - FOR_EACH_VEC_ELT (rtx, vec, i, def_insn) + FOR_EACH_VEC_ELT (rtx, state->modified_list, i, def_insn) print_rtl_single (dump_file, def_insn); } } @@ -700,10 +738,6 @@ combine_reaching_defs (ext_cand *cand, r cancel_changes (0); } - VEC_free (rtx, heap, vec); - VEC_free (rtx, heap, defs_list); - VEC_free (rtx, heap, copies_list); - return false; } @@ -829,26 +863,30 @@ find_and_remove_re (void) int num_re_opportunities = 0, num_realized = 0, i; VEC (ext_cand, heap) *reinsn_list; VEC (rtx, heap) *reinsn_del_list; + ext_state state; /* Construct DU chain to get all reaching definitions of each extension instruction. */ df_chain_add_problem (DF_UD_CHAIN + DF_DU_CHAIN); df_analyze (); + df_set_flags (DF_DEFER_INSN_RESCAN); max_insn_uid = get_max_uid (); - reinsn_del_list = VEC_alloc (rtx, heap, 4); + reinsn_del_list = NULL; reinsn_list = find_removable_extensions (); + state.defs_list = NULL; + state.copies_list = NULL; + state.modified_list = NULL; + state.work_list = NULL; + if (VEC_empty (ext_cand, reinsn_list)) + state.modified = NULL; + else + state.modified = XCNEWVEC (struct ext_modified, max_insn_uid); FOR_EACH_VEC_ELT (ext_cand, reinsn_list, i, curr_cand) { num_re_opportunities++; - /* If the candidate insn is itself a definition insn for another - candidate, it may have been modified and the UD chain broken. - FIXME: the handling of successive extensions can be improved. */ - if (!reg_mentioned_p (curr_cand->expr, PATTERN (curr_cand->insn))) - continue; - /* Try to combine the extension with the definition. */ if (dump_file) { @@ -856,12 +894,13 @@ find_and_remove_re (void) print_rtl_single (dump_file, curr_cand->insn); } - if (combine_reaching_defs (curr_cand, PATTERN (curr_cand->insn))) + if (combine_reaching_defs (curr_cand, PATTERN (curr_cand->insn), &state)) { if (dump_file) fprintf (dump_file, "Eliminated the extension.\n"); num_realized++; VEC_safe_push (rtx, heap, reinsn_del_list, curr_cand->insn); + state.modified[INSN_UID (curr_cand->insn)].deleted = 1; } } @@ -871,6 +910,11 @@ find_and_remove_re (void) VEC_free (ext_cand, heap, reinsn_list); VEC_free (rtx, heap, reinsn_del_list); + VEC_free (rtx, heap, state.defs_list); + VEC_free (rtx, heap, state.copies_list); + VEC_free (rtx, heap, state.modified_list); + VEC_free (rtx, heap, state.work_list); + XDELETEVEC (state.modified); if (dump_file && num_re_opportunities > 0) fprintf (dump_file, "Elimination opportunities = %d realized = %d\n", --- gcc/testsuite/gcc.c-torture/execute/pr51933.c.jj 2012-01-22 19:44:21.950166396 +0100 +++ gcc/testsuite/gcc.c-torture/execute/pr51933.c 2012-01-22 19:44:21.950166396 +0100 @@ -0,0 +1,51 @@ +/* PR rtl-optimization/51933 */ + +static signed char v1; +static unsigned char v2[256], v3[256]; + +__attribute__((noclone, noinline)) void +foo (void) +{ + asm volatile ("" : : "g" (&v1), "g" (&v2[0]), "g" (&v3[0]) : "memory"); +} + +__attribute__((noclone, noinline)) int +bar (const int x, const unsigned short *y, char *z) +{ + int i; + unsigned short u; + if (!v1) + foo (); + for (i = 0; i < x; i++) + { + u = y[i]; + z[i] = u < 0x0100 ? v2[u] : v3[u & 0xff]; + } + z[x] = '\0'; + return x; +} + +int +main (void) +{ + char buf[18]; + unsigned short s[18]; + unsigned char c[18] = "abcdefghijklmnopq"; + int i; + for (i = 0; i < 256; i++) + { + v2[i] = i; + v3[i] = i + 1; + } + for (i = 0; i < 18; i++) + s[i] = c[i]; + s[5] |= 0x600; + s[6] |= 0x500; + s[11] |= 0x2000; + s[15] |= 0x500; + foo (); + if (bar (17, s, buf) != 17 + || __builtin_memcmp (buf, "abcdeghhijkmmnoqq", 18) != 0) + __builtin_abort (); + return 0; +} Jakub