Hi Jeff, Appreciate your effort to review my patch! I've updated my patch as attached. See my answers below.
> in current function, so the store speculation can be avoided. > So at a high level should we be doing this in gimple rather than RTL? > We're going to have a lot more information about types, better > infrastructure for looking at uses/defs, access to the alias oracle, we should > be able to accurately distinguish between potentially shared objects vs those > which are local to the thread, etc. We lose the low level costing information > though. > > I'm still going to go through the patch and do some level of review, but I do > think we need to answer the higher level question though. > I have the following reasons, 1) Following the clue Richard B gave me before about parameter --param allow-store-data-races, I did check the middle-end pass tree-if-conv, but I think this pass at the moment doesn't work for the issue I'm trying to solve. Tree-if-conv is to do if conversion for loop, and its final goal is to help loop vectorization, while my case doesn't have a loop at all. 2) My current solution fits into current back-end if-conversion pass very well. I don't want to invent a new framework to solve this relatively small issue. Besides, this back-end patch doesn't only enhance store speculation detection, but also fix a bug in the original code. > Nits: We typically refer to parameters, variables, etc in comments using > upper case. You'll need to review the entire patch for these its. > > So perhaps the comment should be something like: > > /* Return true of X, a MEM expression, is on the stack. A_INSN contains > X if A_INSN exists. */ > Fixed in attached new patch. > > Just from a design standpoint, what are the consequences if this function > returns true for something that isn't actually in the stack or false for > something that is on the stack? > If noce_mem_is_on_stack returns true for something that isn't actually in the stack, it could potentially introduce store speculation, then the if-conversion optimization will be incorrect. If this function returns false for something that is on stack, it doesn't matter, because the optimization will not be triggered. > Nit: Space between the function name and its open paren for arguments. ie > > if (fixed_base_plus_p (a) > ^ > I see other instances of this nit, please review the patch and correct them. > Fixed in attached new patch. > > > + > > + if (!a_insn) > > + return false; > I'm not sure what the calling profile is for this function, but this is a > cheaper > test, so you might consider moving it before the test of fixed_base_plus_p. > Fixed in attached new patch. > > > + > > + if (!reg_mentioned_p (x, a_insn)) > > + return false; > > + > > + /* Check if x is on stack. Assume a mem expression using registers > > + related to stack register is always on stack. */ > > + FOR_EACH_INSN_USE (use, a_insn) > > + if (reg_mentioned_p (DF_REF_REG (use), x) > > + && bitmap_bit_p (bba_sets_must_be_sfp, DF_REF_REGNO (use))) > > + return true; > > + > > + return false; > > +} > So is X always a MEM? Just wanted to make sure I understand. > reg_mentioned_p will do the right thing (using rtx_equal_p) for the > comparison. > Yes. X is always a MEM. There is an assertion for this in the code above. > > > + > > +/* Always return true, if there is a dominating write. > > + > > + When there is a dominating read from memory on stack, > > + 1) if x = a is a memory read, return true. > > + 2) if x = a is a memory write, return true if the memory is on stack. > > + This is the guarantee the memory is *not* readonly. */ > > + > > +static bool > > +noce_valid_for_dominating (basic_block bb, rtx_insn *a_insn, > > + const_rtx x, bool is_store) { > > + rtx_insn *insn; > > + rtx set; > > + > > + gcc_assert (MEM_P (x)); > > + > > + FOR_BB_INSNS (bb, insn) > > + { > > + set = single_set (insn); > > + if (!set) > > + continue; > > + > > + /* Dominating store */ > > + if (rtx_equal_p (x, SET_DEST (set))) > > + return true; > > + > > + /* Dominating load */ > > + if (rtx_equal_p (x, SET_SRC (set))) > > + if (is_store && noce_mem_is_on_stack (a_insn, x)) > > + return true; > > + } > > + > > + return false; > > +} > So what would be the consequences here of returning false when in fact > there was a dominating read or write? That could easily happen if the > dominating read or write was not a single_set. If return false when in fact there is a dominating read or write, the optimization will not be triggered, because noce_mem_maybe_invalid_p will be true, which is playing the same role as may_trap_or_fault_p. > > I'm guessing that from a design standpoint you're trying to find cases where > you know an object was written to within the block and does not escape. So > returning false when there was a dominating write is safe. > Returning true when there was not would be disastrous. Right? > YES! You've completely understood my code! 😊 > > > > > > @@ -5347,6 +5462,234 @@ > > > > return FALSE; > > } > > + > > + > > +/* Find all insns that must be stack address and store REGNO into > > + bitmap bba_sets_must_be_sfp. */ > > + > > +static void > > +find_all_must_be_sfp_insns (void) > > +{ > > + rtx_insn *a_insn; > > + basic_block bb; > > + bool sfp_found; > > + auto_vec<int, 1> def_count; > > + df_ref def, use; > > + > > + /* Calculate def counts for each insn. */ > > + def_count.safe_grow_cleared (max_reg_num () + 1); FOR_ALL_BB_FN > > + (bb, cfun) > > + FOR_BB_INSNS (bb, a_insn) > > + { > > + if (!INSN_P (a_insn)) > > + continue; > > + > > + FOR_EACH_INSN_DEF (def, a_insn) > > + def_count[DF_REF_REGNO (def)]++; > > + }Is there a reason why you can't use the information computed by > reginfo? > Fixed in attached new patch. > > > > > + > > + /* Iterate all insns until finding all stack pointers, and store > > + them into bba_sets_must_be_sfp. */ bba_sets_must_be_sfp = > > + BITMAP_ALLOC (®_obstack); do > > + { > > + sfp_found = false; > > + FOR_ALL_BB_FN (bb, cfun) > > + FOR_BB_INSNS (bb, a_insn) > > + { > > + rtx sset_a = single_set (a_insn); > > + if (!sset_a) > > + continue; > > + rtx src = SET_SRC (sset_a); > > + rtx dest = SET_DEST (sset_a); > So again, we can get things that aren't a single_set here and they'll be > ignored. But I believe that's safe as you're trying to identify insns which > store > stack addresses into a REG. Missing a more complicated case where a stack > address is stored into a REG is safe. Right? > Yes. Missing some pointers to stack is safe here, because we will use it to check if a memory in if-conversion candidate doesn't have store speculation. If we miss it, the optimization will not be triggered, so there will not be risky. > > > > > + > > + if (!REG_P (dest)) > > + continue; > > + > > + /* For the case below, > > + Control flow: B1->B3, B2->B3 > > + B1: p = &local_var > > + B2: p = &global_var > > + B3: ... = *p > > + pointer p is an address for either local or global variable. > > + so we don't treat p as a stack address pointer. To make > > + algorithm simple, we ignore all non-SSA cases. */ > > + bool skip_insn = false; > > + unsigned int dest_regno = 0; > > + FOR_EACH_INSN_DEF (def, a_insn) > > + { > > + dest_regno = DF_REF_REGNO (def); > > + /* Skip current insn if > > + 1) already marked as stack pointer. > > + 2) or see multiple definition points. */ > > + if (bitmap_bit_p (bba_sets_must_be_sfp, dest_regno) > > + || def_count[dest_regno] > 1) > > + { > > + skip_insn = true; > > + break; > > + } > > + } > > + if (skip_insn) > > + continue; > Right. Again I wonder if you should be using the info computed by regstat. > You can get single use information easily from that. > Yes. Fixed in attached new patch. > > + > > + /* Handle case like "x1 = sp + offset". */ > > + if (fixed_base_plus_p (src)) > > + { > > + bitmap_set_bit (bba_sets_must_be_sfp, dest_regno); > > + sfp_found = true; > > + continue; > > + } > Looks like your you've got an indentation problem here. Most likely you've > got a case where you've got 8 spaces that should be replaced with a tab. > Fixed in attached new patch. > > > + > > + /* Handle case like "x2 = x1 + offset", in which x1 is already > > + identified as a stack pointer. */ > > + if ((GET_CODE (src) == PLUS || GET_CODE (src) == MINUS) > > + && CONST_INT_P (XEXP (src, 1))) > > + { > > + rtx x1 = XEXP (src, 0); > > + if (!REG_P (x1)) > > + continue; > > + > > + FOR_EACH_INSN_USE (use, a_insn) > > + { > > + if (!rtx_equal_p (x1, DF_REF_REG (use))) > > + continue; > > + > > + if (bitmap_bit_p (bba_sets_must_be_sfp, DF_REF_REGNO > (use))) > > + { > > + bitmap_set_bit (bba_sets_must_be_sfp, dest_regno); > > + sfp_found = true; > > + break; > > + } > > + } > So how much is there to be gained from going from this kind of localized > computation to global? It shouldn't be hard to turn this into a truly global > algorithm, but I'm not sure it's worth the effort. > > I _do_ think it's worth visiting blocks in dominator order. It's better than > what you're doing, but nowhere near as expensive as a global propagation > engine. > Fixed in attached new patch using dom_walk class. Please review again. > Is it worth handling simple copies in the code above? I thought it less likely to have a copy for pointer to stack before register allocation. Right? > > > > + } > > + } > > + } > > + while (sfp_found); > > +} > > + > > +/* Find all insns that may be stack pointer and store REGNO into > > + bitmap bba_sets_may_be_sfp. We iterate all insns in current func > > + until no more latent insns generating stack address are found. > > + The collection of stack pointer bba_sets_may_be_sfp will be used > > + to help analyze local stack variable address taken. Stack variable > > + address can be passed out of current frame if only any stack pointer > > + is passed to hard register or stored into memory. */ > Shouldn't "may be stack pointer" be "must be stack pointer"? > There is a typo in the comments. Fixed in attached new patch. > > > + > > +static void > > +find_all_may_be_sfp_insns (void) > > +{ > > + rtx_insn *a_insn; > > + basic_block bb; > > + bool sfp_found; > > + bba_sets_may_be_sfp = BITMAP_ALLOC (®_obstack); > > + > > + /* Iterate all insns that may be stack pointers, and store them into > > + bitmap bba_sets_must_be_sfp. */ > > + do > > + { > > + sfp_found = false; > > + FOR_ALL_BB_FN (bb, cfun) > Again, you may ultimately be better off with a dominator walk here. Yes. Fixed in attached new patch. > > > > + FOR_BB_INSNS (bb, a_insn) > > + { > > + /* Assume stack pointers can only be generated by single SET. > > */ > > + rtx sset_a = single_set (a_insn); > > + if (!sset_a) > > + continue; > > + rtx src = SET_SRC (sset_a); > > + rtx dest = SET_DEST (sset_a); > I'm not sure that's a safe assumption. Fixed in attached new patch. > > > > + > > + /* If we see "hard_register = ...", which implies passing out > > + of current frame potentially, we don't collect this case. > > + It can be treated as address taken in > > no_stack_address_taken */ > > + if (HARD_REGISTER_P (dest)) > > + continue; > > + > > + /* Memory load and store are not to generate stack pointer. */ > > + if (MEM_P (src) || MEM_P (dest)) > > + continue; > > + > > + /* Skip insn that is already identified as stack pointer. */ > > + bool skip_insn = false; > > + df_ref def; > > + unsigned int dest_regno = 0; > > + FOR_EACH_INSN_DEF (def, a_insn) > > + { > > + dest_regno = DF_REF_REGNO (def); > > + if (bitmap_bit_p (bba_sets_may_be_sfp, dest_regno)) > > + { > > + skip_insn = true; > > + break; > > + } > > + } > So earlier you've verified you've got a single set. I'm not sure you need an > iterator over the defs. Can't you just look at SET_DEST > (sset_a) which is conveniently in DEST? Fixed in attached new patch. > > > I don't like the term "stack pointer" here -- most of us read "stack pointer" > and we think the register holding the current stack pointer. > Would "pointer into the stack" be more appropriate? > Fixed in attached new patch. > > > +/* Return true if current function doesn't pass stack address out of > > +frame. */ > > + > > +static bool > > +no_stack_address_taken (void) > > +{ > > + basic_block bb; > > + rtx_insn *a_insn; > > + df_ref use; > > + > > + FOR_ALL_BB_FN (bb, cfun) > > + FOR_BB_INSNS (bb, a_insn) > > + { > > + if (!INSN_P (a_insn)) > > + continue; > > + > > + rtx sset_a = single_set (a_insn); > > + if (!sset_a) > > + continue; > > + rtx src = SET_SRC (sset_a); > > + rtx dest = SET_DEST (sset_a); > > + > > + /* Skip insn that is already identified as frame pointers. */ > > + if (bitmap_bit_p (bba_sets_may_be_sfp, REGNO (dest))) > > + continue; > > + > > + FOR_EACH_INSN_USE (use, a_insn) > > + { > > + /* Check if current insn is using any stack pointer. Ignore > > + if it doesn't use frame pointers at all. */ > > + if (!bitmap_bit_p (bba_sets_may_be_sfp, DF_REF_REGNO (use))) > > + continue; > > + > > + /* Load cannot introduce address taken. */ > > + if (MEM_P (src)) > > + continue; > > + > > + /* Store is safe if only the src doesn't contain stack > > pointer. */ > > + if (MEM_P (dest) > > + && !reg_mentioned_p (DF_REF_REG (use), src)) > > + continue; > > + > > + /* All other cases potentially introduce address taken. */ > > + return false; > > + } > So what if you have a PARALLEL where one entry causes an escape of a > pointer into the stack and another entry does some other operation? If so, > don't you miss the fact that you've got an escaping pointer into the stack? I > don't think you can restrict yourself to single_sets here. Yes. You are right. I have added code to handle PARALLEL case. I handle it in a rather conservative way though. Review again, please! > > Jeff Thanks, -Jiangning
csel4.patch
Description: csel4.patch