Hi Richard,

on 2020/7/8 下午8:50, Richard Sandiford wrote:
> "Kewen.Lin" <li...@linux.ibm.com> writes:
>>> […]
>>>> I tested the updated patch with this releasing, LOOP_VINFO_PEELING_FOR_GAPS
>>>> part looks fine, but LOOP_VINFO_PEELING_FOR_ALIGNMENT caused one case to
>>>> fail at execution during vect-partial-vector-usage=2.  So far the patch
>>>> doesn't handle any niters_skip cases.  I think if we want to support it, 
>>>> we have to add some handlings in/like what we have for masking, such as: 
>>>> mask_skip_niters, vect_prepare_for_masked_peels etc.  
>>>>
>>>> Do you prefer me to extend the support in this patch series?
>>>
>>> It's not so much whether it has to be supported now, but more why
>>> it doesn't work now.  What was the reason for the failure?
>>>
>>> The peeling-with-masking thing is just an optimisation, so that we
>>> can vectorise the peeled iterations rather than falling back to
>>> scalar code for them.  It shouldn't be needed for correctness.
>>>
>>
>> Whoops, thanks for the clarification!  Nice, I just realized it's a way to
>> adopt partial vectors for prologue.  The fail case is 
>> gcc.dg/vect/vect-ifcvt-11.c.
>> There the first iteration is optimized out due to the known AND result of
>> IV 0, then it tries to peel 3 iterations, the number of remaining iterations
>> for vectorization body is expected to be 12.  But it still uses 15 and causes
>> out-of-bound access.
>>
>> The below fix can fix the failure.  The justification is that we need to use
>> the fixed up niters after peeling prolog for the vectorization body for
>> partial vectors.  I'm not sure why the other cases not using partial vectors 
>> don't need the fixed up niters, to avoid troubles I guarded it with 
>> LOOP_VINFO_USING_PARTIAL_VECTORS_P explicitly.
> 
> I think the reason is that if we're peeling prologue iterations and
> the total number of iterations isn't fixed, full-vector vectorisation
> will “almost always” need an epilogue loop too, and in that case
> niters_vector will be nonnull.
> 
> But that's not guaranteed to be true forever.  E.g. if the start
> pointers have a known misalignment that require peeling a constant
> number of iterations N, and if we can prove (using enhanced range/
> nonzero-bits information) that the way niters is calculated means
> that niter - N is a multiple of the vector size, we could peel
> the prologue and not the epilogue.  In that case, what your patch
> does would be correct.
> 

Thanks for the explanation, it makes more sense!

> So…
> 
>> --- a/gcc/tree-vect-loop.c
>> +++ b/gcc/tree-vect-loop.c
>> @@ -8888,6 +8896,11 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple 
>> *loop_vectorized_call)
>>                           LOOP_VINFO_INT_NITERS (loop_vinfo) / lowest_vf);
>>        step_vector = build_one_cst (TREE_TYPE (niters));
>>      }
>> +      else if (LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)
>> +          && !vect_use_loop_mask_for_alignment_p (loop_vinfo))
>> +       vect_gen_vector_loop_niters (loop_vinfo, LOOP_VINFO_NITERS 
>> (loop_vinfo),
>> +                                &niters_vector, &step_vector,
>> +                                niters_no_overflow);
>>        else
>>      vect_gen_vector_loop_niters (loop_vinfo, niters, &niters_vector,
>>                                   &step_vector, niters_no_overflow);
> 
> …I think we should drop the LOOP_VINFO_USING_PARTIAL_VECTORS_P
> condition.  Could you also add a comment above the new call saying:
> 
>    /* vect_do_peeling subtracted the number of peeled prologue
>       iterations from LOOP_VINFO_NITERS.  */
> 
> It wasn't obvious to me where the update was happening when I first
> looked at the code.
> 
> Very minor, but maybe also switch the last two cases round so that
> “else” is the default behaviour and the “if”s are the exceptions.
> 
> OK with those changes, thanks.

Bootstrapped/regtested on aarch64-linux-gnu and powerpc64le-linux-gnu.

Committed it via r11-1978 by incorporating your comments.  Thanks!

BR,
Kewen

Reply via email to