> > > > > >
> > > > > > Perhaps I'm missing something here?
> > > > >
> > > > > OK, so I refreshed my mind of what
> > > > > vect_update_ivs_after_vectorizer
> > > does.
> > > > >
> > > > > I still do not understand the (complexity of the) patch.
> > > > > Basically the function computes the new value of the IV "from
> > > > > scratch" based on the number of scalar iterations of the vector loop,
> the 'niter'
> > > > > argument. I would have expected that for the early exits we
> > > > > either pass in a different 'niter' or alternatively a
> > > > > 'niter_adjustment'.
> > > >
> > > > But for an early exit there's no static value for adjusted niter,
> > > > since you don't know which iteration you exited from. Unlike the
> > > > normal exit when you know if you get there you've done all
> > > > possible
> > > iterations.
> > > >
> > > > So you must compute the scalar iteration count on the exit itself.
> > >
> > > ? You do not need the actual scalar iteration you exited (you don't
> > > compute that either), you need the scalar iteration the vector
> > > iteration started with when it exited prematurely and that's readily
> available?
> >
> > For a normal exit yes, not for an early exit no? niters_vector_mult_vf
> > is only valid for the main exit.
> >
> > There's the unadjusted scalar count, which is what it's using to
> > adjust it to the final count. Unless I'm missing something?
>
> Ah, of course - niters_vector_mult_vf is for the countable exit. For the
> early
> exits we can't precompute the scalar iteration value. But that then means we
> should compute the appropriate "continuation" as live value of the vectorized
> IVs even when they were not originally used outside of the loop. I don't see
> how we can express this in terms of the scalar IVs in the (not yet) vectorized
> loop - similar to the reduction case you are going to end up with the wrong
> values here.
>
> That said, I've for a long time wanted to preserve the original control IV
> also for
> the vector code (leaving any "optimization"
> to IVOPTs there), that would enable us to compute the correct
> "niters_vector_mult_vf" based on that IV.
>
> So given we cannot use the scalar IVs you have to handle all inductions
> (besides the main exit control IV) in vectorizable_live_operation I think.
>
That's what I currently do, that's why there was the
if (STMT_VINFO_LIVE_P (phi_info))
continue;
although I don't understand why we use the scalar count, I suppose the
reasoning
is that we don't really want to keep it around, and referencing it forces it to
be kept?
At the moment it just does `init + (final - init) * vf` which is correct no?
Also you missed the question below about how to avoid the creation of the block,
You ok with changing that?
Thanks,
Tamar
> Or for now disable early-break for inductions that are not the main exit
> control
> IV (in vect_can_advance_ivs_p)?
>
> > > > >
> > > > > It seems your change handles different kinds of inductions
> > > > > differently.
> > > > > Specifically
> > > > >
> > > > > bool ivtemp = gimple_cond_lhs (cond) == iv_var;
> > > > > if (restart_loop && ivtemp)
> > > > > {
> > > > > type = TREE_TYPE (gimple_phi_result (phi));
> > > > > ni = build_int_cst (type, vf);
> > > > > if (inversed_iv)
> > > > > ni = fold_build2 (MINUS_EXPR, type, ni,
> > > > > fold_convert (type, step_expr));
> > > > > }
> > > > >
> > > > > it looks like for the exit test IV we use either 'VF' or 'VF - step'
> > > > > as the new value. That seems to be very odd special casing for
> > > > > unknown reasons. And while you adjust vec_step_op_add, you
> > > > > don't adjust vect_peel_nonlinear_iv_init (maybe not supported -
> > > > > better assert
> > > here).
> > > >
> > > > The VF case is for a normal "non-inverted" loop, where if you take
> > > > an early exit you know that you have to do at most VF iterations.
> > > > The VF
> > > > - step is to account for the inverted loop control flow where you
> > > > exit after adjusting the IV already by + step.
> > >
> > > But doesn't that assume the IV counts from niter to zero? I don't
> > > see this special case is actually necessary, no?
> > >
> >
> > I needed it because otherwise the scalar loop iterates one iteration
> > too little So I got a miscompile with the inverter loop stuff. I'll
> > look at it again perhaps It can be solved differently.
> >
> > > >
> > > > Peeling doesn't matter here, since you know you were able to do a
> > > > vector iteration so it's safe to do VF iterations. So having
> > > > peeled doesn't affect the remaining iters count.
> > > >
> > > > >
> > > > > Also the vec_step_op_add case will keep the original scalar IV
> > > > > live even when it is a vectorized induction. The code
> > > > > recomputing the value from scratch avoids this.
> > > > >
> > > > > /* For non-main exit create an intermediat edge to get any
> > > > > updated
> iv
> > > > > calculations. */
> > > > > if (needs_interm_block
> > > > > && !iv_block
> > > > > && (!gimple_seq_empty_p (stmts) || !gimple_seq_empty_p
> > > > > (new_stmts)))
> > > > > {
> > > > > iv_block = split_edge (update_e);
> > > > > update_e = single_succ_edge (update_e->dest);
> > > > > last_gsi = gsi_last_bb (iv_block);
> > > > > }
> > > > >
> > > > > this is also odd, can we adjust the API instead? I suppose this
> > > > > is because your computation uses the original loop IV, if you
> > > > > based the computation off the initial value only this might not be
> necessary?
> > > >
> > > > No, on the main exit the code updates the value in the loop header
> > > > and puts the Calculation in the merge block. This works because
> > > > it only needs to consume PHI nodes in the merge block and things
> > > > like niters are
> > > adjusted in the guard block.
> > > >
> > > > For an early exit, we don't have a guard block, only the merge block.
> > > > We have to update the PHI nodes in that block, but can't do so
> > > > since you can't produce a value and consume it in a PHI node in the same
> BB.
> > > > So we need to create the block to put the values in for use in the
> > > > merge block. Because there's no "guard" block for early exits.
> > >
> > > ? then compute niters in that block as well.
> >
> > We can't since it'll not be reachable through the right edge. What we
> > can do if you want is slightly change peeling, we currently peel as:
> >
> > \ \ /
> > E1 E2 Normal exit
> > \ | |
> > \ | Guard
> > \ | |
> > Merge block
> > |
> > Pre Header
> >
> > If we instead peel as:
> >
> >
> > \ \ /
> > E1 E2 Normal exit
> > \ | |
> > Exit join Guard
> > \ | |
> > Merge block
> > |
> > Pre Header
> >
> > We can use the exit join block. This would also mean
> > vect_update_ivs_after_vectorizer Doesn't need to iterate over all
> > exits and only really needs to adjust the phi nodes Coming out of the exit
> > join
> and guard block.
> >
> > Does this work for you?
> >
> > Thanks,
> > Tamar
> > >
> > > > The API can be adjusted by always creating the empty block either
> > > > during
> > > peeling.
> > > > That would prevent us from having to do anything special here.
> > > > Would that work better? Or I can do it in the loop that iterates
> > > > over the exits to before the call to
> > > > vect_update_ivs_after_vectorizer, which I think
> > > might be more consistent.
> > > >
> > > > >
> > > > > That said, I wonder why we cannot simply pass in an adjusted
> > > > > niter which would be niters_vector_mult_vf - vf and be done with that?
> > > > >
> > > >
> > > > We can ofcourse not have this and recompute it from niters itself,
> > > > however this does affect the epilog code layout. Particularly
> > > > knowing the static number if iterations left causes it to usually
> > > > unroll the loop and share some of the computations. i.e. the
> > > > scalar code is often more
> > > efficient.
> > > >
> > > > The computation would be niters_vector_mult_vf - iters_done * vf,
> > > > since the value put Here is the remaining iteration count. It's
> > > > static for early
> > > exits.
> > >
> > > Well, it might be "static" in that it doesn't really matter what you
> > > use for the epilog main IV initial value as long as you are sure
> > > you're not going to take that exit as you are sure we're going to
> > > take one of the early exits. So yeah, the special code is probably
> > > OK, but it needs a better comment and as said the structure of
> vect_update_ivs_after_vectorizer is a bit hard to follow now.
> > >
> > > As said an important part for optimization is to not keep the scalar
> > > IVs live in the vector loop.
> > >
> > > > But can do whatever you prefer here. Let me know what you prefer
> > > > for the
> > > above.
> > > >
> > > > Thanks,
> > > > Tamar
> > > >
> > > > > Thanks,
> > > > > Richard.
> > > > >
> > > > >
> > > > > > Regards,
> > > > > > Tamar
> > > > > > >
> > > > > > > > It has to do this since you have to perform the side
> > > > > > > > effects for the non-matching elements still.
> > > > > > > >
> > > > > > > > Regards,
> > > > > > > > Tamar
> > > > > > > >
> > > > > > > > >
> > > > > > > > > > + if (STMT_VINFO_LIVE_P (phi_info))
> > > > > > > > > > + continue;
> > > > > > > > > > +
> > > > > > > > > > + /* For early break the final loop IV is:
> > > > > > > > > > + init + (final - init) * vf which takes into
> > > > > > > > > > account
> peeling
> > > > > > > > > > + values and non-single steps. The main exit
> can
> > > > > > > > > > +use
> > > > > niters
> > > > > > > > > > + since if you exit from the main exit you've
> done
> > > > > > > > > > +all
> > > > > vector
> > > > > > > > > > + iterations. For an early exit we don't know
> when
> > > > > > > > > > +we
> > > > > exit
> > > > > > > > > > +so
> > > > > > > > > we
> > > > > > > > > > + must re-calculate this on the exit. */
> > > > > > > > > > + tree start_expr = gimple_phi_result (phi);
> > > > > > > > > > + off = fold_build2 (MINUS_EXPR, stype,
> > > > > > > > > > + fold_convert (stype,
> start_expr),
> > > > > > > > > > + fold_convert (stype,
> init_expr));
> > > > > > > > > > + /* Now adjust for VF to get the final iteration
> > > > > > > > > > value.
> */
> > > > > > > > > > + off = fold_build2 (MULT_EXPR, stype, off,
> > > > > > > > > > + build_int_cst (stype, vf));
> > > > > > > > > > + }
> > > > > > > > > > + else
> > > > > > > > > > + off = fold_build2 (MULT_EXPR, stype,
> > > > > > > > > > + fold_convert (stype, niters),
> step_expr);
> > > > > > > > > > +
> > > > > > > > > > if (POINTER_TYPE_P (type))
> > > > > > > > > > ni = fold_build_pointer_plus (init_expr, off);
> > > > > > > > > > else
> > > > > > > > > > @@ -2238,6 +2286,8 @@ vect_update_ivs_after_vectorizer
> > > > > > > > > > (loop_vec_info
> > > > > > > > > loop_vinfo,
> > > > > > > > > > /* Don't bother call vect_peel_nonlinear_iv_init. */
> > > > > > > > > > else if (induction_type == vect_step_op_neg)
> > > > > > > > > > ni = init_expr;
> > > > > > > > > > + else if (restart_loop)
> > > > > > > > > > + continue;
> > > > > > > > >
> > > > > > > > > This looks all a bit complicated - why wouldn't we
> > > > > > > > > simply always use the PHI result when 'restart_loop'?
> > > > > > > > > Isn't that the correct old start value in
> > > > > > > all cases?
> > > > > > > > >
> > > > > > > > > > else
> > > > > > > > > > ni = vect_peel_nonlinear_iv_init (&stmts, init_expr,
> > > > > > > > > > niters, step_expr, @@
> > > > > > > > > > -
> > > 2245,9 +2295,20 @@
> > > > > > > > > > vect_update_ivs_after_vectorizer
> > > > > > > > > (loop_vec_info
> > > > > > > > > > loop_vinfo,
> > > > > > > > > >
> > > > > > > > > > var = create_tmp_var (type, "tmp");
> > > > > > > > > >
> > > > > > > > > > - last_gsi = gsi_last_bb (exit_bb);
> > > > > > > > > > gimple_seq new_stmts = NULL;
> > > > > > > > > > ni_name = force_gimple_operand (ni, &new_stmts,
> > > > > > > > > > false, var);
> > > > > > > > > > +
> > > > > > > > > > + /* For non-main exit create an intermediat edge
> > > > > > > > > > + to get any
> > > > > updated iv
> > > > > > > > > > + calculations. */
> > > > > > > > > > + if (needs_interm_block
> > > > > > > > > > + && !iv_block
> > > > > > > > > > + && (!gimple_seq_empty_p (stmts) ||
> > > > > > > > > > +!gimple_seq_empty_p
> > > > > > > > > (new_stmts)))
> > > > > > > > > > + {
> > > > > > > > > > + iv_block = split_edge (update_e);
> > > > > > > > > > + update_e = single_succ_edge (update_e->dest);
> > > > > > > > > > + last_gsi = gsi_last_bb (iv_block);
> > > > > > > > > > + }
> > > > > > > > > > +
> > > > > > > > > > /* Exit_bb shouldn't be empty. */
> > > > > > > > > > if (!gsi_end_p (last_gsi))
> > > > > > > > > > {
> > > > > > > > > > @@ -3342,8 +3403,26 @@ vect_do_peeling (loop_vec_info
> > > > > > > > > > loop_vinfo, tree
> > > > > > > > > niters, tree nitersm1,
> > > > > > > > > > niters_vector_mult_vf steps. */
> > > > > > > > > > gcc_checking_assert (vect_can_advance_ivs_p
> (loop_vinfo));
> > > > > > > > > > update_e = skip_vector ? e : loop_preheader_edge
> > > > > > > > > > (epilog);
> > > > > > > > > > - vect_update_ivs_after_vectorizer (loop_vinfo,
> > > > > niters_vector_mult_vf,
> > > > > > > > > > - update_e);
> > > > > > > > > > + if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > > > > > > > + update_e = single_succ_edge (e->dest);
> > > > > > > > > > + bool inversed_iv
> > > > > > > > > > + = !vect_is_loop_exit_latch_pred
> (LOOP_VINFO_IV_EXIT
> > > > > (loop_vinfo),
> > > > > > > > > > + LOOP_VINFO_LOOP
> > > > > (loop_vinfo));
> > > > > > > > >
> > > > > > > > > You are computing this here and in
> > > vect_update_ivs_after_vectorizer?
> > > > > > > > >
> > > > > > > > > > +
> > > > > > > > > > + /* Update the main exit first. */
> > > > > > > > > > + vect_update_ivs_after_vectorizer (loop_vinfo,
> > > > > > > > > > + vf,
> > > > > > > niters_vector_mult_vf,
> > > > > > > > > > + update_e,
> inversed_iv);
> > > > > > > > > > +
> > > > > > > > > > + /* And then update the early exits. */
> > > > > > > > > > + for (auto exit : get_loop_exit_edges (loop))
> > > > > > > > > > + {
> > > > > > > > > > + if (exit == LOOP_VINFO_IV_EXIT (loop_vinfo))
> > > > > > > > > > + continue;
> > > > > > > > > > +
> > > > > > > > > > + vect_update_ivs_after_vectorizer (loop_vinfo, vf,
> > > > > > > > > > +
> niters_vector_mult_vf,
> > > > > > > > > > + exit, true);
> > > > > > > > >
> > > > > > > > > ... why does the same not work here? Wouldn't the
> > > > > > > > > proper condition be !dominated_by_p (CDI_DOMINATORS,
> > > > > > > > > exit->src, LOOP_VINFO_IV_EXIT
> > > > > > > > > (loop_vinfo)->src) or similar? That is, whether the
> > > > > > > > > exit is at or after the main IV exit? (consider having
> > > > > > > > > two)
> > > > > > > > >
> > > > > > > > > > + }
> > > > > > > > > >
> > > > > > > > > > if (skip_epilog)
> > > > > > > > > > {
> > > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Richard Biener <[email protected]> SUSE Software Solutions
> > > > > > > Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809,
> > > > > > > AG
> > > > > > > Nuernberg)
> > > > > >
> > > > >
> > > > > --
> > > > > Richard Biener <[email protected]> SUSE Software Solutions
> > > > > Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > > > > Nuernberg)
> > > >
> > >
> > > --
> > > Richard Biener <[email protected]>
> > > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461
> > > Nuernberg, Germany;
> > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > > Nuernberg)
> >
>
> --
> Richard Biener <[email protected]>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> Nuernberg)