[Bug tree-optimization/111048] [14 Regression] Wrong AVX2 code on highway-1.0.6 on -O2 and above since r14-3243-ga7dba4a1c05a76

prathamesh3492 at gcc dot gnu.org via Gcc-bugs Fri, 18 Aug 2023 07:27:43 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111048


--- Comment #8 from prathamesh3492 at gcc dot gnu.org ---
(In reply to rsand...@gcc.gnu.org from comment #7)
>         = ((q1 & 0) == 0) ? VECTOR_CST_NPATTERNS (arg0)
>                           : VECTOR_CST_NPATTERNS (arg1);
> 
> should be q1 & 1 :)

Oops, sorry for the typo :/
And yes, that fixes the issue.

For more context we have following inputs to VEC_PERM_EXPR:
arg0 (1, 1): { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }
arg1: (4, 1): { 255, 63, 15, 3, 255, 63, 15, 3, 255, 63, 15, 3, 255, 63, 15, 3
}
sel (2, 3):  { 0, 16, 1, 17, 2, 18, ... }
arg0 len: 16
sel nelts: 16

In valid_mask_for_fold_vec_perm_cst_p for the pattern: {16, 17, 18, ...}
arg_npatterns is erroneously set to VECTOR_CST_NPATTERNS (arg0) and we have:
step = 1, arg_npatterns = 1
Thus, step becomes a "multiple" of arg_npatterns and we (wrongly) return true
for this case.

So in the loop below in fold_vec_perm_cst, we have res with following encoding:
res (4, 3): { 0, 255, 0, 63, 0, 15, 0, 3, 0, 255, 0, 63, ... }

Since len = 16, it has to compute the remaining elements.
For index 13, it comes as "a3" in pattern: { 255, 15, 255, ... }
So the step gets computed as: 255 - 15 = 240
And IIUC the next element thus becomes: (255 + 240)%256 = 239.

By correctly setting arg_npatterns = VECTOR_CST_NPATTERNS (arg1) for this
case, arg_npatterns becomes 4.
Since step == 1 is not a multiple of arg_npatterns we return false,
and use the fallback:
res_npatterns = 16, res_nelts_per_pattern = 1.
and the loop below correctly encodes the elements.

I will shortly send a patch after validating it.

Thanks,
Prathamesh

[Bug tree-optimization/111048] [14 Regression] Wrong AVX2 code on highway-1.0.6 on -O2 and above since r14-3243-ga7dba4a1c05a76

Reply via email to