On Wed, Nov 2, 2016 at 11:08 AM, Richard Biener <richard.guent...@gmail.com> wrote: > On Tue, Nov 1, 2016 at 4:24 PM, Bin.Cheng <amker.ch...@gmail.com> wrote: >> On Thu, Aug 11, 2016 at 11:38 AM, Richard Biener >> <richard.guent...@gmail.com> wrote: >>> On Thu, Aug 11, 2016 at 11:56 AM, Bin.Cheng <amker.ch...@gmail.com> wrote: >>>> On Thu, Aug 11, 2016 at 10:50 AM, Richard Biener >>>> <richard.guent...@gmail.com> wrote: >>>>> On Wed, Aug 10, 2016 at 5:58 PM, Bin Cheng <bin.ch...@arm.com> wrote: >>>>>> Hi, >>>>>> Due to some reasons, tree-if-conv.c now factors floating point >>>>>> comparison out of cond_expr, >>>>>> resulting in mixed types in it. This does help CSE on common comparison >>>>>> operations. >>>>>> Only problem is that test gcc.dg/vect/pr56541.c now requires >>>>>> vect_cond_mixed to be >>>>>> vectorized. This patch changes the test in that way. >>>>>> Test result checked. Is it OK? >>>>> >>>>> Hmm, I think the fix is to fix if-conversion not doing that. Can you >>>>> track down why this happens? >>>> Hmm, but there are several common floating comparison operations in >>>> the case, by doing this, we could do CSE on GIMPLE, otherwise we >>>> depends on RTL optimizers. >>> >>> I see. >>> >>>> I thought we prefer GIMPLE level >>>> transforms? >>> >>> Yes, but the vectorizer is happier with the conditions present in the >>> COND_EXPR >>> and thus we concluded we always want to have them there. forwprop will >>> also aggressively put them back. Note that we cannot put back >>> tree_could_throw_p >>> conditions (FP compares with signalling nans for example) to properly model >>> EH >>> (though for VEC_COND_EXPRs we don't really care here). >>> >>> Note that nothing between if-conversion and vectorization will perform >>> the desired >>> CSE anyway. >> Hi Richard, >> I looked into this one and found it was not if-conv factors cond_expr >> out. For test case: >> >> for (i=0; i!=1024; ++i) >> { >> float rR = a*z[i]; >> float rL = b*z[i]; >> float rMin = (rR<rL) ? rR : rL; >> float rMax = (rR<rL) ? rL : rR; >> rMin = (rMax>0) ? rMin : rBig; >> rMin = (rMin>0) ? rMin : rMax; >> ok[i] = rMin-c<rMax+d; >> } >> Dump before jump threading is like: >> >> <bb 7>: >> # iftmp.3_12 = PHI <rL_18(5), rR_17(6)> >> if (iftmp.3_12 > 0.0) >> goto <bb 9>; >> else >> goto <bb 8>; >> >> <bb 8>: >> >> <bb 9>: >> # iftmp.4_13 = PHI <iftmp.2_11(7), 1.5e+2(8)> >> if (iftmp.4_13 > 0.0) >> goto <bb 11>; >> else >> goto <bb 10>; >> >> <bb 10>: >> >> <bb 11>: >> # iftmp.5_14 = PHI <iftmp.4_13(9), iftmp.3_12(10)> >> >> Jump thread in dom pass threads edges (bb7 -> bb8 -> ... bb11) to (bb6 >> -> bb12 -> bb9) as below: >> >> <bb 6>: >> # iftmp.3_12 = PHI <rL_18(4), rR_17(5), rL_18(11)> >> # iftmp.2_23 = PHI <iftmp.2_11(4), iftmp.2_11(5), iftmp.2_10(11)> >> if (iftmp.3_12 > 0.0) >> goto <bb 7>; >> else >> goto <bb 12>; >> >> <bb 7>: >> # iftmp.4_13 = PHI <iftmp.2_23(6)> >> if (iftmp.4_13 > 0.0) >> goto <bb 9>; >> else >> goto <bb 8>; >> >> <bb 8>: >> >> <bb 9>: >> # iftmp.5_14 = PHI <iftmp.4_13(7), iftmp.3_12(8), 1.5e+2(12)> >> >> //... >> <bb 12>: >> # iftmp.4_22 = PHI <1.5e+2(6)> >> goto <bb 9>; >> >> This transform saves one comparison on the path, but creates multi-arg >> phi, resulting in cond_expr being factored out. >> >> Looks like threading corrupts vectorization opportunity for target >> doesn't support vect_cond_mixed, but I guess it's hard to tell in >> threading itself. Any ideas? > > First of all I wonder why MIN/MAX are not pattern matched earlier... > (I'd generally avoid threading through them). Hmm, this case is not MIN/MAX cases, though it uses rMin/rMax as variable names. So it cannot be pattern matched.
As for below cases you mentioned. Since I am introducing more/advanced pattern in match.pd and adding call to fold_stmt in fold_stmt, I think they can be addressed in this way too, I will keep this in mind in my cond_expr pattern work. So we don't need to go that far to vector pattern matches. Thanks, bin > > But yes, I see that if-conversion can cause factored out conditons. > Also for the simple case of ! condition: > > _49 = iftmp.3_12 > 0.0; > _50 = ~_49; > iftmp.5_14 = _50 ? 1.5e+2 : _ifc__48; > > but those should have been eliminated by swapping operands (looks like > that doesn't reliably work). > > In the end we have the bool patterns in the vectorizer to recover from this, > so I wonder why that doesn't work then. It should re-write the above to > > _111 = iftmp.3_12 > 0.0 ? 1 : 0; // of proper type > _112 = ~ _111; > iftmp.5_14 = _112 != 0 ? ....; > > Richard. > >> Thanks, >> bin