On Wed, Jan 14, 2015 at 2:14 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: > Richard, > > I did all changes proposed by you and add couple tests. > Bootstrap, including aggressive one proposed by you, and regression > testing did not show any new failures. > > Is it OK for trunk?
+++ b/gcc/testsuite/gcc.dg/vect/vect-aggressive-1.c @@ -0,0 +1,25 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_condition } */ +/* { dg-require-effective-target fopenmp } */ +/* { dg-options "-fopenmp" } */ please use { dg-additional-options "-fopenmp-simd" } instead and vect_simd_clones target, not fopenmp target. +/* { dg-options "-mavx2 -ffast-math -O3 -fopenmp -fdump-tree-vect-details" } */ likewise (for -ffast-math - don't use -mavx2, instead require a proper vect target on the scan-tree-dump-times line It would also be nice to have these runtime testcases so it can be verified the code executes correctly instead of possibly producing random crap ;) The patch is ok with the above changes. Thanks, Richard. > ChangeLog: > > 2015-01-14 Yuri Rumyantsev <ysrum...@gmail.com> > > * tree-if-conv.c: Include hash-map.h. > (aggressive_if_conv): New variable. > (fold_build_cond_expr): Add simplification of non-zero condition. > (add_to_dst_predicate_list): Invoke add_to_predicate_list if edge > destination block is not always executed. > (if_convertible_phi_p): Fix commentary, allow phi nodes have more > than two predecessors if AGGRESSIVE_IF_CONV is true. > (if_convertible_stmt_p): Fix commentary. > (all_preds_critical_p): New function. > (has_pred_critical_p): New function. > (if_convertible_bb_p): Fix commentary, if AGGRESSIVE_IF_CONV is true > BB can have more than two predecessors and all incoming edges can be > critical. > (predicate_bbs): Skip predication for loop exit block, use build2_loc > to compute predicate for true edge. > (find_phi_replacement_condition): Delete this function. > (is_cond_scalar_reduction): Add arguments ARG_0, ARG_1 and EXTENDED. > Allow interchange PHI arguments if EXTENDED is false. > Change check that block containing reduction statement candidate > is predecessor of phi-block since phi may have more than two arguments. > (phi_args_hash_traits): New helper structure. > (struct phi_args_hash_traits): New type. > (phi_args_hash_traits::hash): New function. > (phi_args_hash_traits::equal_keys): New function. > (gen_phi_arg_condition): New function. > (predicate_scalar_phi): Add handling of phi nodes with more than two > arguments, delete COND and TRUE_BB arguments, insert body of > find_phi_replacement_condition to predicate ordinary phi nodes. > (predicate_all_scalar_phis): Skip blocks with the only predecessor, > delete call of find_phi_replacement_condition and invoke > predicate_scalar_phi with two arguments. > (insert_gimplified_predicates): Add assert that non-predicated block > don't have statements to insert. > (ifcvt_split_critical_edges): New function. > (ifcvt_split_def_stmt): Likewise. > (ifcvt_walk_pattern_tree): Likewise. > (stmt_is_root_of_bool_pattern): Likewise. > (ifcvt_repair_bool_pattern): Likewise. > (ifcvt_local_dce): Likewise. > (tree_if_conversion): Add initialization of AGGRESSIVE_IF_CONV which > is copy of inner or outer loop force_vectorize field, invoke > ifcvt_split_critical_edges, ifcvt_local_dce and > ifcvt_repair_bool_pattern for aggressive if-conversion. > > gcc/testsuite/ChangeLog > > * gcc.dg/vect/vect-aggressive-1.c: New test. > * gcc.target/i386/avx2-vect-aggressive.c: Likewise. > > 2015-01-09 15:27 GMT+03:00 Richard Biener <richard.guent...@gmail.com>: >> On Mon, Dec 22, 2014 at 3:39 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: >>> Richard, >>> >>> I changed algorithm for bool pattern repair. >>> It turned out that ifcvt_local_dce phaase is required since for >>> test-case I sent you in previous mail vectorization is not performed >>> without dead code elimination: >>> >>> For the loop >>> #pragma omp simd safelen(8) >>> for (i=0; i<512; i++) >>> { >>> float t = a[i]; >>> if (t > 0.0f & t < 1.0e+17f) >>> if (c[i] != 0) >>> res += 1; >>> } >>> >>> I've got the following message from vectorizer: >>> >>> t3.c:10:11: note: ==> examining statement: _ifc__39 = t_5 > 0.0; >>> >>> t3.c:10:11: note: bit-precision arithmetic not supported. >>> t3.c:10:11: note: not vectorized: relevant stmt not supported: >>> _ifc__39 = t_5 > 0.0; >>> >>> It is caused by the following dead predicate computations after >>> critical edge splitting: >>> >>> (after combine blocks): >>> >>> <bb 3>: >>> # res_15 = PHI <res_1(7), 0(19)> >>> # i_16 = PHI <i_11(7), 0(19)> >>> # ivtmp_14 = PHI <ivtmp_13(7), 512(19)> >>> t_5 = a[i_16]; >>> _6 = t_5 > 0.0; >>> _7 = t_5 < 9.9999998430674944e+16; >>> _8 = _6 & _7; >>> _10 = &c[i_16]; >>> _ifc__36 = _8 ? 4294967295 : 0; >>> _9 = MASK_LOAD (_10, 0B, _ifc__36); >>> _28 = _8; >>> _29 = _9 != 0; >>> _30 = _28 & _29; >>> // Statements below are dead!! >>> _31 = _8; >>> _32 = _9 != 0; >>> _33 = ~_32; >>> _34 = _31 & _33; >>> // End of dead statements. >>> _ifc__35 = _30 ? 1 : 0; >>> res_1 = res_15 + _ifc__35; >>> i_11 = i_16 + 1; >>> ivtmp_13 = ivtmp_14 - 1; >>> if (ivtmp_13 != 0) >>> goto <bb 7>; >>> else >>> goto <bb 8>; >>> >>> But if we delete these statements loop will be vectorized. >> >> Hm, ok. We insert predicates too early obviously and not only when >> needed. But let's fix that later. >> >>> New patch is attached. >> >> fold_build_cond_expr (tree type, tree cond, tree rhs, tree lhs) >> { >> tree rhs1, lhs1, cond_expr; >> + >> + /* If COND is comparison r != 0 and r has boolean type, convert COND >> + to SSA_NAME to accept by vect bool pattern. */ >> + if (TREE_CODE (cond) == NE_EXPR) >> + { >> + tree op0 = TREE_OPERAND (cond, 0); >> + tree op1 = TREE_OPERAND (cond, 1); >> + if (TREE_CODE (op0) == SSA_NAME >> + && TREE_CODE (TREE_TYPE (op0)) == BOOLEAN_TYPE >> + && (integer_zerop (op1))) >> + cond = op0; >> + else if (TREE_CODE (op1) == SSA_NAME >> + && TREE_CODE (TREE_TYPE (op1)) == BOOLEAN_TYPE >> + && (integer_zerop (op0))) >> + cond = op1; >> >> The 2nd form, 0 != SSA_NAME doesn't happen due to operand >> canonicalization. Please remove its handling. >> >> + if (gimple_phi_num_args (phi) != 2) >> + { >> + if (!aggressive_if_conv) >> >> && !aggressive_if_conv >> >> + if (EDGE_COUNT (bb->preds) > 2) >> + { >> + if (!aggressive_if_conv) >> >> Likewise. >> >> - gimple reduc; >> + && (rhs = gimple_phi_arg_def (phi, 0)))) { >> >> the { goes to the next line >> >> static void >> predicate_mem_writes (loop_p loop) >> { >> - unsigned int i, orig_loop_num_nodes = loop->num_nodes; >> + unsigned int i, j, orig_loop_num_nodes = loop->num_nodes; >> + tree mask_vec[10]; >> >> an upper limit of 10? >> >> + for (j=0; j<10; j++) >> >> spaces around '<' and '=' >> >> + mask_vec[j] = NULL_TREE; >> + >> >> + gcc_assert (exact_log2 (bitsize) != -1); >> + if ((mask = mask_vec[exact_log2 (bitsize)]) == NULL_TREE) >> + { >> >> this seems to be a completely separate "optimization"? Note that >> there are targets with non-power-of-two bitsize modes (PSImode), >> so the assert will likely trigger. I would prefer if you separate this >> part of the patch. >> >> + if ( gimple_code (stmt) != GIMPLE_ASSIGN) >> + continue; >> >> no space before gimple_code >> >> + imm_use_iterator imm_iter; >> + >> + >> + worklist.create (64); >> >> excessive vertical space. >> >> The patch misses the addition of new testcases - please add some, >> otherwise the code will be totally untested. >> >> I assume the patch passes bootstrap and regtest (you didn't say so). >> Can you also do a bootstrap with aggressive_if_conv forced to >> true and --with-build-config=bootstrap-O3 --disable-werror? >> >> Thanks, >> Richard. >> >>> Thanks. >>> Yuri. >>> >>> 2014-12-19 14:45 GMT+03:00 Richard Biener <richard.guent...@gmail.com>: >>>> On Thu, Dec 18, 2014 at 2:45 PM, Yuri Rumyantsev <ysrum...@gmail.com> >>>> wrote: >>>>> Richard, >>>>> >>>>> I am sending you full patch (~1000 lines) but if you need only patch.1 >>>>> and patch.2 will let me know and i'll send you reduced patch. >>>>> >>>>> Below are few comments regarding your remarks for patch.3. >>>>> >>>>> 1. I deleted sub-phase ifcvt_local_dce since I did not find test-case >>>>> when dead code elimination is required to vectorize loop, i.e. dead >>>>> statement is marked as relevant. >>>>> 2. You wrote: >>>>>> The "retry" code also looks odd - why do you walk the BB multiple >>>>>> times instead of just doing sth like >>>>>> >>>>>> while (!has_single_use (lhs)) >>>>>> { >>>>>> gimple copy = ifcvt_split_def_stmt (def_stmt); >>>>>> ifcvt_walk_pattern_tree (copy); >>>>>> } >>>>>> >>>>>> thus returning the copy you create and re-process it (the copy should >>>>>> now have a single-use). >>>>> >>>>> The problem is that not only top SSA_NAME (lhs) may have multiple uses >>>>> but some intermediate variables too. For example, for the following >>>>> test-case >>>>> >>>>> float a[1000]; >>>>> int c[1000]; >>>>> >>>>> int foo() >>>>> { >>>>> int i, res = 0; >>>>> #pragma omp simd safelen(8) >>>>> for (i=0; i<512; i++) >>>>> { >>>>> float t = a[i]; >>>>> if (t > 0.0f & t < 1.0e+17f) >>>>> if (c[i] != 0) >>>>> res += 1; >>>>> } >>>>> return res; >>>>> } >>>>> >>>>> After combine_blocks we have the following bb: >>>>> >>>>> <bb 3>: >>>>> # res_15 = PHI <res_1(7), 0(15)> >>>>> # i_16 = PHI <i_11(7), 0(15)> >>>>> # ivtmp_14 = PHI <ivtmp_13(7), 512(15)> >>>>> t_5 = a[i_16]; >>>>> _6 = t_5 > 0.0; >>>>> _7 = t_5 < 9.9999998430674944e+16; >>>>> _8 = _6 & _7; >>>>> _10 = &c[i_16]; >>>>> _ifc__32 = _8 ? 4294967295 : 0; >>>>> _9 = MASK_LOAD (_10, 0B, _ifc__32); >>>>> _28 = _8; >>>>> _29 = _9 != 0; >>>>> _30 = _28 & _29; >>>>> _ifc__31 = _30 ? 1 : 0; >>>>> res_1 = res_15 + _ifc__31; >>>>> i_11 = i_16 + 1; >>>>> ivtmp_13 = ivtmp_14 - 1; >>>>> if (ivtmp_13 != 0) >>>>> goto <bb 7>; >>>>> else >>>>> goto <bb 8>; >>>>> >>>>> and we can see that _8 has multiple uses. Also note that after splitting >>>>> of >>>>> _8 = _6 & _7 >>>>> we also get multiple uses for definition of _6 and _7. So I used this >>>>> iterative algorithm as the simplest one. >>>> >>>> But it walks the entire pattern again and again while you only need to >>>> ensure you walk the pattern tree of the now single-use DEF again >>>> (in fact, rather than replacing a random USE in ifcvt_split_def_stmt >>>> you should pass down the user_operand_p that you need to make >>>> single-use). >>>> >>>>> I think it would be nice to re-use some utility from tree-vect-patterns.c >>>>> for stmt_is_root_of_bool_pattern. >>>>> >>>>> I assume that function stmt_is_root_of_bool_pattern can be simplified >>>>> to check on COND_EXPR only since PHI predication and memory access >>>>> predication produced only such statements,i.e. it can look like >>>>> >>>>> static bool >>>>> stmt_is_root_of_bool_pattern (gimple stmt, tree *var) >>>>> { >>>>> enum tree_code code; >>>>> tree lhs, rhs; >>>>> >>>>> code = gimple_assign_rhs_code (stmt); >>>>> if (code == COND_EXPR) >>>>> { >>>>> rhs = gimple_assign_rhs1 (stmt); >>>>> if (TREE_CODE (rhs) != SSA_NAME) >>>>> return false; >>>>> *var = rhs; >>>>> return true; >>>>> } >>>>> return false; >>>>> } >>>>> >>>>> I also did few minor changes in patch.2. >>>>> >>>>> 3. You can also notice that I inserted code in tree_if_conversion to >>>>> do loop version if explicit option "-ftree-loop-if-convert" was not >>>>> passed to compiler, i.e. we perform if-conversion for loop >>>>> vectorization only and if it does not take place, we should delete >>>>> if-converted version of loop. >>>>> What is your opinion? >>>> >>>> Overall part 1 and part 2 look good to me, predicate_scalar_phi >>>> looks in need of some refactoring to avoid duplicate code. We can >>>> do that a followup. >>>> >>>> Part 3 still needs the iteration to be resolved and make the use we >>>> actually care about single-use, not a random one so we can avoid >>>> iterating completely. >>>> >>>> Richard. >>>> >>>>> Thanks. >>>>> Yuri. >>>>> >>>>> 2014-12-17 18:41 GMT+03:00 Richard Biener <richard.guent...@gmail.com>: >>>>>> On Tue, Dec 16, 2014 at 4:15 PM, Yuri Rumyantsev <ysrum...@gmail.com> >>>>>> wrote: >>>>>>> Hi Richard, >>>>>>> >>>>>>> Here is updated patch which includes >>>>>>> (1) split critical edges for aggressive if conversion. >>>>>>> (2) delete all stuff related to support of critical edge predication. >>>>>>> (3) only one function - predicate_scalar_phi performs predication. >>>>>>> (4) function find_phi_replacement_condition was deleted since it was >>>>>>> included in predicate_scalar_phi for phi with two arguments. >>>>>>> >>>>>>> I checked that patch works in stress testing mode, i.e. with >>>>>>> aggressive if conversion by default. >>>>>>> >>>>>>> What is your opinion? >>>>>> >>>>>> Looks ok overall, but please simply do >>>>>> >>>>>> FOR_EACH_EDGE (e, ei, bb->succs) >>>>>> if (EDGE_CRITICAL_P (e) && e->dest->loop_father == loop) >>>>>> split_edge (e); >>>>>> >>>>>> for all blocks apart from the latch. >>>>>> >>>>>> Can you please send a combined patch up to this one? Looking at >>>>>> the incremental diff is somewhat hard. Thus a patch including all >>>>>> patches from patch1 to this one. >>>>>> >>>>>> Thanks, >>>>>> Richard. >>>>>> >>>>>>> >>>>>>> Thanks. >>>>>>> Yuri. >>>>>>> >>>>>>> 2014-12-11 11:59 GMT+03:00 Richard Biener <richard.guent...@gmail.com>: >>>>>>>> On Wed, Dec 10, 2014 at 4:22 PM, Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>> wrote: >>>>>>>>> Richard, >>>>>>>>> >>>>>>>>> Thanks for your reply! >>>>>>>>> >>>>>>>>> I didn't understand your point: >>>>>>>>> >>>>>>>>> Well, I don't mind splitting all critical edges unconditionally >>>>>>>>> >>>>>>>>> but you do it unconditionally in proposed patch. >>>>>>>> >>>>>>>> I don't mind means I am fine with it. >>>>>>>> >>>>>>>>> Also I assume that >>>>>>>>> call of split_critical_edges() can break ssa. For example, we can >>>>>>>>> split headers of loops, loop exit blocks etc. >>>>>>>> >>>>>>>> How does that "break SSA"? You mean loop-closed SSA? I'd >>>>>>>> be surprised if so but that may be possible. >>>>>>>> >>>>>>>>> I prefer to do something >>>>>>>>> more loop-specialized, e.g. call edge_split() for critical edges >>>>>>>>> outgoing from bb ending with GIMPLE_COND stmt (assuming that edge >>>>>>>>> destination bb belongs to loop). >>>>>>>> >>>>>>>> That works for me as well but it is more complicated to implement. >>>>>>>> Ideally you'd only split one edge if you find a block with only >>>>>>>> critical >>>>>>>> predecessors (where we'd currently give up). But note that this >>>>>>>> requires re-computation of ifc_bbs in if_convertible_loop_p_1 and it >>>>>>>> will change loop->num_nodes so we have to be more careful in >>>>>>>> constructing the loop calling if_convertible_bb_p. >>>>>>>> >>>>>>>> Richard. >>>>>>>> >>>>>>>>> >>>>>>>>> 2014-12-10 17:31 GMT+03:00 Richard Biener >>>>>>>>> <richard.guent...@gmail.com>: >>>>>>>>>> On Wed, Dec 10, 2014 at 11:54 AM, Yuri Rumyantsev >>>>>>>>>> <ysrum...@gmail.com> wrote: >>>>>>>>>>> Richard, >>>>>>>>>>> >>>>>>>>>>> Sorry that I forgot to delete debug dump from my fix. >>>>>>>>>>> I have few questions about your comments. >>>>>>>>>>> >>>>>>>>>>> 1. You wrote : >>>>>>>>>>>> You also still have two functions for PHI predication. And the >>>>>>>>>>>> new extended variant doesn't commonize the 2-args and general >>>>>>>>>>>> path >>>>>>>>>>> Did you mean that I must combine predicate_scalar_phi and >>>>>>>>>>> predicate_extended scalar phi to one function? >>>>>>>>>>> Please note that if additional flag was not set up (i.e. >>>>>>>>>>> aggressive_if_conv is false) extended predication is required more >>>>>>>>>>> compile time since it builds hash_map. >>>>>>>>>> >>>>>>>>>> It's compile-time complexity is reasonable enough even for >>>>>>>>>> non-aggressive if-conversion. >>>>>>>>>> >>>>>>>>>>> 2. About critical edge splitting. >>>>>>>>>>> >>>>>>>>>>> Did you mean that we should perform it (1) under aggressive_if_conv >>>>>>>>>>> option only; (2) should we split all critical edges. >>>>>>>>>>> Note that this leads to recomputing of topological order. >>>>>>>>>> >>>>>>>>>> Well, I don't mind splitting all critical edges unconditionally, thus >>>>>>>>>> do something like >>>>>>>>>> >>>>>>>>>> Index: gcc/tree-if-conv.c >>>>>>>>>> =================================================================== >>>>>>>>>> --- gcc/tree-if-conv.c (revision 218515) >>>>>>>>>> +++ gcc/tree-if-conv.c (working copy) >>>>>>>>>> @@ -2235,12 +2235,21 @@ pass_if_conversion::execute (function *f >>>>>>>>>> if (number_of_loops (fun) <= 1) >>>>>>>>>> return 0; >>>>>>>>>> >>>>>>>>>> + bool critical_edges_split_p = false; >>>>>>>>>> FOR_EACH_LOOP (loop, 0) >>>>>>>>>> if (flag_tree_loop_if_convert == 1 >>>>>>>>>> || flag_tree_loop_if_convert_stores == 1 >>>>>>>>>> || ((flag_tree_loop_vectorize || loop->force_vectorize) >>>>>>>>>> && !loop->dont_vectorize)) >>>>>>>>>> - todo |= tree_if_conversion (loop); >>>>>>>>>> + { >>>>>>>>>> + if (!critical_edges_split_p) >>>>>>>>>> + { >>>>>>>>>> + split_critical_edges (); >>>>>>>>>> + critical_edges_split_p = true; >>>>>>>>>> + todo |= TODO_cleanup_cfg; >>>>>>>>>> + } >>>>>>>>>> + todo |= tree_if_conversion (loop); >>>>>>>>>> + } >>>>>>>>>> >>>>>>>>>> #ifdef ENABLE_CHECKING >>>>>>>>>> { >>>>>>>>>> >>>>>>>>>>> It is worth noting that in current implementation bb's with 2 >>>>>>>>>>> predecessors and both are on critical edges are accepted without >>>>>>>>>>> additional option. >>>>>>>>>> >>>>>>>>>> Yes, I know. >>>>>>>>>> >>>>>>>>>> tree-if-conv.c is a mess right now and if we can avoid adding more >>>>>>>>>> to it and even fix the critical edge missed optimization with >>>>>>>>>> splitting >>>>>>>>>> critical edges then I am all for that solution. >>>>>>>>>> >>>>>>>>>> Richard. >>>>>>>>>> >>>>>>>>>>> Thanks ahead. >>>>>>>>>>> Yuri. >>>>>>>>>>> 2014-12-09 18:20 GMT+03:00 Richard Biener >>>>>>>>>>> <richard.guent...@gmail.com>: >>>>>>>>>>>> On Tue, Dec 9, 2014 at 2:11 PM, Yuri Rumyantsev >>>>>>>>>>>> <ysrum...@gmail.com> wrote: >>>>>>>>>>>>> Richard, >>>>>>>>>>>>> >>>>>>>>>>>>> Here is updated patch2 with the following changes: >>>>>>>>>>>>> 1. Delete functions phi_has_two_different_args and >>>>>>>>>>>>> find_insertion_point. >>>>>>>>>>>>> 2. Use only one function for extended predication - >>>>>>>>>>>>> predicate_extended_scalar_phi. >>>>>>>>>>>>> 3. Save gsi before insertion of predicate computations for basic >>>>>>>>>>>>> blocks if it has 2 predecessors and >>>>>>>>>>>>> both incoming edges are critical or it gas more than 2 >>>>>>>>>>>>> predecessors >>>>>>>>>>>>> and at least one incoming edge >>>>>>>>>>>>> is critical. This saved iterator can be used by extended phi >>>>>>>>>>>>> predication. >>>>>>>>>>>>> >>>>>>>>>>>>> Here is motivated test-case which explains this point. >>>>>>>>>>>>> Test-case is attached (t5.c) and it must be compiled with -O2 >>>>>>>>>>>>> -ftree-loop-vectorize -fopenmp options. >>>>>>>>>>>>> The problem phi is in bb-7: >>>>>>>>>>>>> >>>>>>>>>>>>> bb_5 (preds = {bb_4 }, succs = {bb_7 bb_9 }) >>>>>>>>>>>>> { >>>>>>>>>>>>> <bb 5>: >>>>>>>>>>>>> xmax_edge_18 = xmax_edge_36 + 1; >>>>>>>>>>>>> if (xmax_17 == xmax_27) >>>>>>>>>>>>> goto <bb 7>; >>>>>>>>>>>>> else >>>>>>>>>>>>> goto <bb 9>; >>>>>>>>>>>>> >>>>>>>>>>>>> } >>>>>>>>>>>>> bb_6 (preds = {bb_4 }, succs = {bb_7 bb_8 }) >>>>>>>>>>>>> { >>>>>>>>>>>>> <bb 6>: >>>>>>>>>>>>> if (xmax_17 == xmax_27) >>>>>>>>>>>>> goto <bb 7>; >>>>>>>>>>>>> else >>>>>>>>>>>>> goto <bb 8>; >>>>>>>>>>>>> >>>>>>>>>>>>> } >>>>>>>>>>>>> bb_7 (preds = {bb_6 bb_5 }, succs = {bb_11 }) >>>>>>>>>>>>> { >>>>>>>>>>>>> <bb 7>: >>>>>>>>>>>>> # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)> >>>>>>>>>>>>> xmax_edge_19 = xmax_edge_39 + 1; >>>>>>>>>>>>> goto <bb 11>; >>>>>>>>>>>>> >>>>>>>>>>>>> } >>>>>>>>>>>>> >>>>>>>>>>>>> Note that both incoming edges to bb_7 are critical. If we comment >>>>>>>>>>>>> out >>>>>>>>>>>>> restoring gsi in predicate_all_scalar_phi: >>>>>>>>>>>>> #if 0 >>>>>>>>>>>>> if ((EDGE_COUNT (bb->preds) == 2 && all_preds_critical_p (bb)) >>>>>>>>>>>>> || (EDGE_COUNT (bb->preds) > 2 && has_pred_critical_p (bb))) >>>>>>>>>>>>> gsi = bb_insert_point (bb); >>>>>>>>>>>>> else >>>>>>>>>>>>> #endif >>>>>>>>>>>>> gsi = gsi_after_labels (bb); >>>>>>>>>>>>> >>>>>>>>>>>>> we will get ICE: >>>>>>>>>>>>> t5.c: In function 'foo': >>>>>>>>>>>>> t5.c:9:6: error: definition in block 4 follows the use >>>>>>>>>>>>> void foo (int n) >>>>>>>>>>>>> ^ >>>>>>>>>>>>> for SSA_NAME: _1 in statement: >>>>>>>>>>>>> _52 = _1 & _3; >>>>>>>>>>>>> t5.c:9:6: internal compiler error: verify_ssa failed >>>>>>>>>>>>> >>>>>>>>>>>>> smce predicate computations were inserted in bb_7. >>>>>>>>>>>> >>>>>>>>>>>> The issue is obviously that the predicates have already been >>>>>>>>>>>> emitted >>>>>>>>>>>> in the target BB - that's of course the wrong place. This is done >>>>>>>>>>>> by insert_gimplified_predicates. >>>>>>>>>>>> >>>>>>>>>>>> This just shows how edge predicate handling is broken - we don't >>>>>>>>>>>> seem to have a sequence of gimplified stmts for edge predicates >>>>>>>>>>>> but push those to e->dest which makes this really messy. >>>>>>>>>>>> >>>>>>>>>>>> Rather than having a separate phase where we insert all >>>>>>>>>>>> gimplified bb predicates we should do that on-demand when >>>>>>>>>>>> predicating a PHI. >>>>>>>>>>>> >>>>>>>>>>>> Your patch writes to stderr - that's bad - use dump_file and guard >>>>>>>>>>>> the printfs properly. >>>>>>>>>>>> >>>>>>>>>>>> You also still have two functions for PHI predication. And the >>>>>>>>>>>> new extended variant doesn't commonize the 2-args and general >>>>>>>>>>>> paths. >>>>>>>>>>>> >>>>>>>>>>>> I'm not at all happy with this code. It may be existing if-conv >>>>>>>>>>>> codes >>>>>>>>>>>> fault but making it even worse is not an option. >>>>>>>>>>>> >>>>>>>>>>>> Again - what's wrong with simply splitting critical edges if >>>>>>>>>>>> aggressive_if_conv? I think that would very much simplify >>>>>>>>>>>> things here. Or alternatively use gsi_insert_on_edge and >>>>>>>>>>>> commit edge insertions before merging the blocks. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Richard. >>>>>>>>>>>> >>>>>>>>>>>>> ChangeLog is >>>>>>>>>>>>> >>>>>>>>>>>>> 2014-12-09 Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>>>>>>> >>>>>>>>>>>>> * tree-if-conv.c : Include hash-map.h. >>>>>>>>>>>>> (struct bb_predicate_s): Add new field to save copy of gimple >>>>>>>>>>>>> statement iterator. >>>>>>>>>>>>> (bb_insert_point): New function. >>>>>>>>>>>>> (set_bb_insert_point): New function. >>>>>>>>>>>>> (has_pred_critical_p): New function. >>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if >>>>>>>>>>>>> AGGRESSIVE_IF_CONV is true. >>>>>>>>>>>>> (if_convertible_bb_p): Delete check that bb has at least one >>>>>>>>>>>>> non-critical incoming edge. >>>>>>>>>>>>> (is_cond_scalar_reduction): Add arguments ARG_0, ARG_1 and >>>>>>>>>>>>> EXTENDED. >>>>>>>>>>>>> Allow interchange PHI arguments if EXTENDED is false. >>>>>>>>>>>>> Change check that block containing reduction statement candidate >>>>>>>>>>>>> is predecessor of phi-block since phi may have more than two >>>>>>>>>>>>> arguments. >>>>>>>>>>>>> (predicate_scalar_phi): Add new arguments for call of >>>>>>>>>>>>> is_cond_scalar_reduction. >>>>>>>>>>>>> (get_predicate_for_edge): New function. >>>>>>>>>>>>> (struct phi_args_hash_traits): New type. >>>>>>>>>>>>> (phi_args_hash_traits::hash): New function. >>>>>>>>>>>>> (phi_args_hash_traits::equal_keys): New function. >>>>>>>>>>>>> (gen_phi_arg_condition): New function. >>>>>>>>>>>>> (predicate_extended_scalar_phi): New function. >>>>>>>>>>>>> (predicate_all_scalar_phis): Add boolean variable EXTENDED and >>>>>>>>>>>>> set it >>>>>>>>>>>>> to true if BB containing phi has more than 2 predecessors or both >>>>>>>>>>>>> incoming edges are critical. Invoke >>>>>>>>>>>>> find_phi_replacement_condition and >>>>>>>>>>>>> predicate_scalar_phi if EXTENDED is false. Use saved gsi if BB >>>>>>>>>>>>> has 2 predecessors and both incoming edges are critical or it has >>>>>>>>>>>>> more >>>>>>>>>>>>> than 2 predecessors and atleast one incoming edge is critical. >>>>>>>>>>>>> Use standard gsi_after_labels otherwise. >>>>>>>>>>>>> Invoke predicate_extended_scalar_phi if EXTENDED is true. >>>>>>>>>>>>> (insert_gimplified_predicates): Add bool variable >>>>>>>>>>>>> EXTENDED_PREDICATION >>>>>>>>>>>>> to save gsi before insertion of predicate computations. SEt-up it >>>>>>>>>>>>> to >>>>>>>>>>>>> true for BB with 2 predecessors and critical incoming edges either >>>>>>>>>>>>> number of predecessors is geater 2 and at least one >>>>>>>>>>>>> incoming edge is >>>>>>>>>>>>> critical. >>>>>>>>>>>>> Add check that non-predicated block may have statements to insert. >>>>>>>>>>>>> Insert predicate computation of BB just after label if >>>>>>>>>>>>> EXTENDED_PREDICATION is true. >>>>>>>>>>>>> (tree_if_conversion): Add initialization of AGGRESSIVE_IF_CONV >>>>>>>>>>>>> which >>>>>>>>>>>>> is copy of inner or outer loop force_vectorize field. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> 2014-12-04 16:37 GMT+03:00 Richard Biener >>>>>>>>>>>>> <richard.guent...@gmail.com>: >>>>>>>>>>>>>> On Thu, Dec 4, 2014 at 2:15 PM, Yuri Rumyantsev >>>>>>>>>>>>>> <ysrum...@gmail.com> wrote: >>>>>>>>>>>>>>> Richard, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I did simple change by saving gsi iterator for each bb that has >>>>>>>>>>>>>>> critical edges by adding additional field to bb_predicate_s: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> typedef struct bb_predicate_s { >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> /* The condition under which this basic block is executed. */ >>>>>>>>>>>>>>> tree predicate; >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> /* PREDICATE is gimplified, and the sequence of statements is >>>>>>>>>>>>>>> recorded here, in order to avoid the duplication of >>>>>>>>>>>>>>> computations >>>>>>>>>>>>>>> that occur in previous conditions. See PR44483. */ >>>>>>>>>>>>>>> gimple_seq predicate_gimplified_stmts; >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> /* Insertion point for blocks having incoming critical edges. >>>>>>>>>>>>>>> */ >>>>>>>>>>>>>>> gimple_stmt_iterator gsi; >>>>>>>>>>>>>>> } *bb_predicate_p; >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> and this iterator is saved in insert_gimplified_predicates >>>>>>>>>>>>>>> before >>>>>>>>>>>>>>> insertion code for predicate computation. I checked that this >>>>>>>>>>>>>>> fix >>>>>>>>>>>>>>> works. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Huh? I still wonder what the issue is with inserting everything >>>>>>>>>>>>>> after the PHI we predicate. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Well, your updated patch will come with testcases for the >>>>>>>>>>>>>> testsuite >>>>>>>>>>>>>> that will hopefully fail if doing that. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Richard. >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Now I am implementing merging of predicate_extended.. and >>>>>>>>>>>>>>> predicate_arbitrary.. functions as you proposed. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Best regards. >>>>>>>>>>>>>>> Yuri. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 2014-12-04 15:41 GMT+03:00 Richard Biener >>>>>>>>>>>>>>> <richard.guent...@gmail.com>: >>>>>>>>>>>>>>>> On Tue, Dec 2, 2014 at 4:28 PM, Yuri Rumyantsev >>>>>>>>>>>>>>>> <ysrum...@gmail.com> wrote: >>>>>>>>>>>>>>>>> Thanks Richard for your quick reply! >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 1. I agree that we can combine predicate_extended_ and >>>>>>>>>>>>>>>>> predicate_arbitrary_ to one function as you proposed. >>>>>>>>>>>>>>>>> 2. What is your opinion about using more simple decision about >>>>>>>>>>>>>>>>> insertion point - if bb has use of phi result insert phi >>>>>>>>>>>>>>>>> predication >>>>>>>>>>>>>>>>> before it and at the bb end otherwise. I assume that critical >>>>>>>>>>>>>>>>> edge >>>>>>>>>>>>>>>>> splitting is not a good decision. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Why not always insert before the use? Which would be after >>>>>>>>>>>>>>>> labels, >>>>>>>>>>>>>>>> what we do for two-arg PHIs. That is, how can it be that you >>>>>>>>>>>>>>>> predicate >>>>>>>>>>>>>>>> a PHI in BB1 and then for an edge predicate on one of its >>>>>>>>>>>>>>>> incoming >>>>>>>>>>>>>>>> edges you get SSA uses with defs that are in BB1 itself? That >>>>>>>>>>>>>>>> can only happen for backedges but those you can't remove in >>>>>>>>>>>>>>>> any case. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Richard. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Best regards. >>>>>>>>>>>>>>>>> Yuri. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 2014-12-02 16:28 GMT+03:00 Richard Biener >>>>>>>>>>>>>>>>> <richard.guent...@gmail.com>: >>>>>>>>>>>>>>>>>> On Mon, Dec 1, 2014 at 4:53 PM, Yuri Rumyantsev >>>>>>>>>>>>>>>>>> <ysrum...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>> Hi Richard, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I resend you patch1 and patch2 with minor changes: >>>>>>>>>>>>>>>>>>> 1. I renamed flag_force_vectorize to aggressive_if_conv. >>>>>>>>>>>>>>>>>>> 2. Use static cast for the first argument of >>>>>>>>>>>>>>>>>>> gimple_phi_arg_edge. >>>>>>>>>>>>>>>>>>> I also very sorry that I sent you bad patch. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Now let me answer on your questions related to second patch. >>>>>>>>>>>>>>>>>>> 1. Why we need both predicate_extended_scalar_phi and >>>>>>>>>>>>>>>>>>> predicate_arbitrary_scalar_phi? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Let's consider the following simple test-case: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> #pragma omp simd safelen(8) >>>>>>>>>>>>>>>>>>> for (i=0; i<512; i++) >>>>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>>>> float t = a[i]; >>>>>>>>>>>>>>>>>>> if (t > 0.0f & t < 1.0e+17f) >>>>>>>>>>>>>>>>>>> if (c[i] != 0) /* c is integer array. */ >>>>>>>>>>>>>>>>>>> res += 1; >>>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> we can see the following phi node correspondent to res: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5)> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> It is clear that we can optimize it to phi node with 2 >>>>>>>>>>>>>>>>>>> arguments only >>>>>>>>>>>>>>>>>>> and only one check can be used for phi predication (for >>>>>>>>>>>>>>>>>>> reduction in >>>>>>>>>>>>>>>>>>> our case), namely predicate of bb_5. In general case we >>>>>>>>>>>>>>>>>>> can't do it >>>>>>>>>>>>>>>>>>> even if we sort all phi argument values since we still have >>>>>>>>>>>>>>>>>>> to produce >>>>>>>>>>>>>>>>>>> a chain of cond expressions to perform phi predication (see >>>>>>>>>>>>>>>>>>> comments >>>>>>>>>>>>>>>>>>> for predicate_arbitrary_scalar_phi). >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> How so? We can always use !(condition) for the "last" >>>>>>>>>>>>>>>>>> value, thus >>>>>>>>>>>>>>>>>> treat it as an 'else' case. That even works for >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5), res_10(7)> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> where the condition for edges 5 and 7 can be computed as >>>>>>>>>>>>>>>>>> ! (condition for 3 || condition for 4). >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Of course it is worthwhile to also sort single-occurances >>>>>>>>>>>>>>>>>> first >>>>>>>>>>>>>>>>>> so your case gets just the condiiton for edge 5 and its >>>>>>>>>>>>>>>>>> inversion >>>>>>>>>>>>>>>>>> used for edges 3 and 4 combined. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> 2. Why we need to introduce find_insertion_point? >>>>>>>>>>>>>>>>>>> Let's consider another test-case extracted from 175.vpr ( >>>>>>>>>>>>>>>>>>> t5.c is >>>>>>>>>>>>>>>>>>> attached) and we can see that bb_7 and bb_9 containig phi >>>>>>>>>>>>>>>>>>> nodes has >>>>>>>>>>>>>>>>>>> only critical incoming edges and both contain code >>>>>>>>>>>>>>>>>>> computing edge >>>>>>>>>>>>>>>>>>> predicates, e.g. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> <bb 7>: >>>>>>>>>>>>>>>>>>> # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)> >>>>>>>>>>>>>>>>>>> _46 = xmax_17 == xmax_37; >>>>>>>>>>>>>>>>>>> _47 = xmax_17 == xmax_27; >>>>>>>>>>>>>>>>>>> _48 = _46 & _47; >>>>>>>>>>>>>>>>>>> _53 = xmax_17 == xmax_37; >>>>>>>>>>>>>>>>>>> _54 = ~_53; >>>>>>>>>>>>>>>>>>> _55 = xmax_17 == xmax_27; >>>>>>>>>>>>>>>>>>> _56 = _54 & _55; >>>>>>>>>>>>>>>>>>> _57 = _48 | _56; >>>>>>>>>>>>>>>>>>> xmax_edge_19 = xmax_edge_39 + 1; >>>>>>>>>>>>>>>>>>> goto <bb 11>; >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> It is evident that we can not put phi predication at the >>>>>>>>>>>>>>>>>>> block >>>>>>>>>>>>>>>>>>> beginning but need to put it after predicate computations. >>>>>>>>>>>>>>>>>>> Note also that if there are no critical edges for phi >>>>>>>>>>>>>>>>>>> arguments >>>>>>>>>>>>>>>>>>> insertion point will be "after labels" Note also that phi >>>>>>>>>>>>>>>>>>> result can >>>>>>>>>>>>>>>>>>> have use in this block too, so we can't put predication >>>>>>>>>>>>>>>>>>> code to the >>>>>>>>>>>>>>>>>>> block end. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> So the issue is that predicate insertion for edge predicates >>>>>>>>>>>>>>>>>> does >>>>>>>>>>>>>>>>>> not happen on the edge but somewhere else (generally >>>>>>>>>>>>>>>>>> impossible >>>>>>>>>>>>>>>>>> for critical edges unless you split them). >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I think I've told you before that I prefer simple solutions >>>>>>>>>>>>>>>>>> to such issues, >>>>>>>>>>>>>>>>>> like splitting the edge! Certainly not involving a function >>>>>>>>>>>>>>>>>> walking >>>>>>>>>>>>>>>>>> GENERIC expressions. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>> Richard. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Let me know if you still have any questions. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Best regards. >>>>>>>>>>>>>>>>>>> Yuri. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> 2014-11-28 15:43 GMT+03:00 Richard Biener >>>>>>>>>>>>>>>>>>> <richard.guent...@gmail.com>: >>>>>>>>>>>>>>>>>>>> On Wed, Nov 12, 2014 at 2:35 PM, Yuri Rumyantsev >>>>>>>>>>>>>>>>>>>> <ysrum...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>> Hi All, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Here is the second patch related to extended predication. >>>>>>>>>>>>>>>>>>>>> Few comments which explain a main goal of design. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> 1. I don't want to insert any critical edge splitting >>>>>>>>>>>>>>>>>>>>> since it may >>>>>>>>>>>>>>>>>>>>> lead to less efficient binaries. >>>>>>>>>>>>>>>>>>>>> 2. One special case of extended PHI node predication was >>>>>>>>>>>>>>>>>>>>> introduced >>>>>>>>>>>>>>>>>>>>> when #arguments is more than 2 but only two arguments are >>>>>>>>>>>>>>>>>>>>> different >>>>>>>>>>>>>>>>>>>>> and one argument has the only occurrence. For such PHI >>>>>>>>>>>>>>>>>>>>> conditional >>>>>>>>>>>>>>>>>>>>> scalar reduction is applied. >>>>>>>>>>>>>>>>>>>>> This is correspondent to the following statement: >>>>>>>>>>>>>>>>>>>>> if (q1 && q2 && q3) var++ >>>>>>>>>>>>>>>>>>>>> New function phi_has_two_different_args was introduced >>>>>>>>>>>>>>>>>>>>> to detect such phi. >>>>>>>>>>>>>>>>>>>>> 3. Original algorithm for PHI predication used assumption >>>>>>>>>>>>>>>>>>>>> that at >>>>>>>>>>>>>>>>>>>>> least one incoming edge for blocks containing PHI is not >>>>>>>>>>>>>>>>>>>>> critical - it >>>>>>>>>>>>>>>>>>>>> guarantees that all computations related to predicate of >>>>>>>>>>>>>>>>>>>>> normal edge >>>>>>>>>>>>>>>>>>>>> are already inserted above this block and >>>>>>>>>>>>>>>>>>>>> code related to PHI predication can be inserted at the >>>>>>>>>>>>>>>>>>>>> beginning of >>>>>>>>>>>>>>>>>>>>> block. But this is not true for critical edges for which >>>>>>>>>>>>>>>>>>>>> predicate >>>>>>>>>>>>>>>>>>>>> computations are in the block where code for phi >>>>>>>>>>>>>>>>>>>>> predication must be >>>>>>>>>>>>>>>>>>>>> inserted. So new function find_insertion_point is >>>>>>>>>>>>>>>>>>>>> introduced which is >>>>>>>>>>>>>>>>>>>>> simply found out the last statement in block defining >>>>>>>>>>>>>>>>>>>>> predicates >>>>>>>>>>>>>>>>>>>>> correspondent to all incoming edges and insert phi >>>>>>>>>>>>>>>>>>>>> predication code >>>>>>>>>>>>>>>>>>>>> after it (with some minor exceptions). >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Unfortunately the patch doesn't apply for me - I get >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> patch: **** malformed patch at line 505: @@ -1720,6 >>>>>>>>>>>>>>>>>>>> +2075,8 @@ >>>>>>>>>>>>>>>>>>>> predicate_all_scalar_phis (struct loop *loop) >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> a few remarks nevertheless. I don't see how we need both >>>>>>>>>>>>>>>>>>>> predicate_extended_scalar_phi and >>>>>>>>>>>>>>>>>>>> predicate_arbitrary_scalar_phi. >>>>>>>>>>>>>>>>>>>> Couldn't we simply sort an array of (edge, value) pairs >>>>>>>>>>>>>>>>>>>> after value >>>>>>>>>>>>>>>>>>>> and handle equal values specially in >>>>>>>>>>>>>>>>>>>> predicate_extended_scalar_phi? >>>>>>>>>>>>>>>>>>>> That would even make PHI <a, a, b, c, c> more optimal. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I don't understand the need for find_insertion_point. All >>>>>>>>>>>>>>>>>>>> SSA names >>>>>>>>>>>>>>>>>>>> required for the predicates are defined upward - and the >>>>>>>>>>>>>>>>>>>> complex CFG >>>>>>>>>>>>>>>>>>>> is squashed to a single basic-block, thus the defs will >>>>>>>>>>>>>>>>>>>> dominate the >>>>>>>>>>>>>>>>>>>> inserted code if you insert after labels just like for the >>>>>>>>>>>>>>>>>>>> other case. >>>>>>>>>>>>>>>>>>>> Or what am I missing? ("flattening" of the basic-blocks >>>>>>>>>>>>>>>>>>>> of course needs >>>>>>>>>>>>>>>>>>>> to happen in dominator order - but I guess that happens >>>>>>>>>>>>>>>>>>>> already?) >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I'd like the extended PHI handling to be enablable by a >>>>>>>>>>>>>>>>>>>> flag even >>>>>>>>>>>>>>>>>>>> for !force-vectorization - I've seen cases with 3 PHI args >>>>>>>>>>>>>>>>>>>> multiple >>>>>>>>>>>>>>>>>>>> times that would have been nice to vectorize. I suggest to >>>>>>>>>>>>>>>>>>>> add -ftree-loop-if-convert-aggressive for this. We can do >>>>>>>>>>>>>>>>>>>> this as >>>>>>>>>>>>>>>>>>>> followup, but please rename the local flag_force_vectorize >>>>>>>>>>>>>>>>>>>> flag >>>>>>>>>>>>>>>>>>>> to something less looking like a flag, like simply >>>>>>>>>>>>>>>>>>>> 'aggressive'. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Otherwise patch 2 looks ok to me. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Richard. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> ChangeLog: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> 2014-10-24 Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> * tree-if-conv.c (ifcvt_can_use_mask_load_store): Use >>>>>>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE instead of loop flag. >>>>>>>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than 2 >>>>>>>>>>>>>>>>>>>>> predecessors if >>>>>>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE is true. >>>>>>>>>>>>>>>>>>>>> (if_convertible_bb_p): Delete check that bb has at least >>>>>>>>>>>>>>>>>>>>> one >>>>>>>>>>>>>>>>>>>>> non-critical incoming edge. >>>>>>>>>>>>>>>>>>>>> (phi_has_two_different_args): New function. >>>>>>>>>>>>>>>>>>>>> (is_cond_scalar_reduction): Add argument EXTENDED to >>>>>>>>>>>>>>>>>>>>> choose access >>>>>>>>>>>>>>>>>>>>> to phi arguments. Invoke phi_has_two_different_args to >>>>>>>>>>>>>>>>>>>>> get phi >>>>>>>>>>>>>>>>>>>>> arguments if EXTENDED is true. Change check that block >>>>>>>>>>>>>>>>>>>>> containing reduction statement candidate is predecessor >>>>>>>>>>>>>>>>>>>>> of phi-block since phi may have more than two arguments. >>>>>>>>>>>>>>>>>>>>> (convert_scalar_cond_reduction): Add argument BEFORE to >>>>>>>>>>>>>>>>>>>>> insert >>>>>>>>>>>>>>>>>>>>> statement before/after gsi point. >>>>>>>>>>>>>>>>>>>>> (predicate_scalar_phi): Add argument false (which means >>>>>>>>>>>>>>>>>>>>> non-extended >>>>>>>>>>>>>>>>>>>>> predication) to call of is_cond_scalar_reduction. Add >>>>>>>>>>>>>>>>>>>>> argument >>>>>>>>>>>>>>>>>>>>> true (which correspondent to argument BEFORE) to call of >>>>>>>>>>>>>>>>>>>>> convert_scalar_cond_reduction. >>>>>>>>>>>>>>>>>>>>> (get_predicate_for_edge): New function. >>>>>>>>>>>>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function. >>>>>>>>>>>>>>>>>>>>> (predicate_extended_scalar_phi): New function. >>>>>>>>>>>>>>>>>>>>> (find_insertion_point): New function. >>>>>>>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Add two boolean variables >>>>>>>>>>>>>>>>>>>>> EXTENDED and >>>>>>>>>>>>>>>>>>>>> BEFORE. Initialize EXTENDED to true if BB containing phi >>>>>>>>>>>>>>>>>>>>> has more >>>>>>>>>>>>>>>>>>>>> than 2 predecessors or both incoming edges are critical. >>>>>>>>>>>>>>>>>>>>> Invoke >>>>>>>>>>>>>>>>>>>>> find_phi_replacement_condition and predicate_scalar_phi or >>>>>>>>>>>>>>>>>>>>> find_insertion_point and predicate_extended_scalar_phi >>>>>>>>>>>>>>>>>>>>> depending on >>>>>>>>>>>>>>>>>>>>> EXTENDED value. >>>>>>>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add check that >>>>>>>>>>>>>>>>>>>>> non-predicated block >>>>>>>>>>>>>>>>>>>>> may have statements to insert. Insert predicate of BB >>>>>>>>>>>>>>>>>>>>> just after label >>>>>>>>>>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE is true. >>>>>>>>>>>>>>>>>>>>> (tree_if_conversion): Add initialization of >>>>>>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE which >>>>>>>>>>>>>>>>>>>>> is copy of inner or outer loop field force_vectorize.