https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111648

--- Comment #4 from prathamesh3492 at gcc dot gnu.org ---
(In reply to prathamesh3492 from comment #3)
> Created attachment 56037 [details]
> Untested fix
> 
> The issue is that when a1 is a multiple of vector length, we end up creating
> following encoding in result: { base_elem, arg[0], arg[1], ... } where arg
> is chosen input vector, which is incorrect.
> 
> For above case, vectorizer pass creates VEC_PERM_EXPR<arg0, arg, sel> where:
> arg0: { -16, -9, -10, -11 } 
> arg1: { -12, -5, -6, -7 } 
> sel = { 3, 4, 5, 6 }
> 
> arg0, arg1 and sel are encoded with npatterns = 1 and nelts_per_pattern = 3.
> Since a1 = 4 and arg_len = 4, it ended up creating the result with
> following encoding:
> res = { arg0[3], arg1[0], arg1[1] } // npatterns = 1, nelts_per_pattern = 3
>     = { -11, -12, -5 }
> 
> So for res[4], it used S = (-5) - (-12) = 7
Typo: I meant res[3], not res[4]. Sorry.
> And hence computed it as -5 + 7 = 2.
> instead of arg1[2], ie, -6.
> which is the difference we see in output at -O0 vs -O2.
> 
> The patch tweaks the constratints in valid_mask_for_fold_vec_perm_cst_p to
> punt if a1 is a multiple of vector length, so a1 ... ae only selects from
> stepped part of the input vector, which seems to fix this issue.
> I will run a proper bootstrap+test and post it upstream.

Reply via email to