Hi Richard, on 2020/7/8 下午8:50, Richard Sandiford wrote: > "Kewen.Lin" <li...@linux.ibm.com> writes: >>> […] >>>> I tested the updated patch with this releasing, LOOP_VINFO_PEELING_FOR_GAPS >>>> part looks fine, but LOOP_VINFO_PEELING_FOR_ALIGNMENT caused one case to >>>> fail at execution during vect-partial-vector-usage=2. So far the patch >>>> doesn't handle any niters_skip cases. I think if we want to support it, >>>> we have to add some handlings in/like what we have for masking, such as: >>>> mask_skip_niters, vect_prepare_for_masked_peels etc. >>>> >>>> Do you prefer me to extend the support in this patch series? >>> >>> It's not so much whether it has to be supported now, but more why >>> it doesn't work now. What was the reason for the failure? >>> >>> The peeling-with-masking thing is just an optimisation, so that we >>> can vectorise the peeled iterations rather than falling back to >>> scalar code for them. It shouldn't be needed for correctness. >>> >> >> Whoops, thanks for the clarification! Nice, I just realized it's a way to >> adopt partial vectors for prologue. The fail case is >> gcc.dg/vect/vect-ifcvt-11.c. >> There the first iteration is optimized out due to the known AND result of >> IV 0, then it tries to peel 3 iterations, the number of remaining iterations >> for vectorization body is expected to be 12. But it still uses 15 and causes >> out-of-bound access. >> >> The below fix can fix the failure. The justification is that we need to use >> the fixed up niters after peeling prolog for the vectorization body for >> partial vectors. I'm not sure why the other cases not using partial vectors >> don't need the fixed up niters, to avoid troubles I guarded it with >> LOOP_VINFO_USING_PARTIAL_VECTORS_P explicitly. > > I think the reason is that if we're peeling prologue iterations and > the total number of iterations isn't fixed, full-vector vectorisation > will “almost always” need an epilogue loop too, and in that case > niters_vector will be nonnull. > > But that's not guaranteed to be true forever. E.g. if the start > pointers have a known misalignment that require peeling a constant > number of iterations N, and if we can prove (using enhanced range/ > nonzero-bits information) that the way niters is calculated means > that niter - N is a multiple of the vector size, we could peel > the prologue and not the epilogue. In that case, what your patch > does would be correct. >
Thanks for the explanation, it makes more sense! > So… > >> --- a/gcc/tree-vect-loop.c >> +++ b/gcc/tree-vect-loop.c >> @@ -8888,6 +8896,11 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple >> *loop_vectorized_call) >> LOOP_VINFO_INT_NITERS (loop_vinfo) / lowest_vf); >> step_vector = build_one_cst (TREE_TYPE (niters)); >> } >> + else if (LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) >> + && !vect_use_loop_mask_for_alignment_p (loop_vinfo)) >> + vect_gen_vector_loop_niters (loop_vinfo, LOOP_VINFO_NITERS >> (loop_vinfo), >> + &niters_vector, &step_vector, >> + niters_no_overflow); >> else >> vect_gen_vector_loop_niters (loop_vinfo, niters, &niters_vector, >> &step_vector, niters_no_overflow); > > …I think we should drop the LOOP_VINFO_USING_PARTIAL_VECTORS_P > condition. Could you also add a comment above the new call saying: > > /* vect_do_peeling subtracted the number of peeled prologue > iterations from LOOP_VINFO_NITERS. */ > > It wasn't obvious to me where the update was happening when I first > looked at the code. > > Very minor, but maybe also switch the last two cases round so that > “else” is the default behaviour and the “if”s are the exceptions. > > OK with those changes, thanks. Bootstrapped/regtested on aarch64-linux-gnu and powerpc64le-linux-gnu. Committed it via r11-1978 by incorporating your comments. Thanks! BR, Kewen