https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123414

--- Comment #5 from Robin Dapp <rdapp at gcc dot gnu.org> ---
Heh, it's more insidious than that and it's indeed a middle-end issue and not a
target one.

For emulated reduction we use a scheme like

          /* Case 2: Create:
             for (offset = nelements/2; offset >= 1; offset/=2)
                {
                  Create:  va' = vec_shift <va, offset>
                  Create:  va = vop <va, va'>
                }  */

where the shifts are done through permutes.

In order to keep the neutral value 1, the result of this reduction is inserted
in a vector like
{res, 1, 1, 1, ...}.

We optimize this vector constructor in forwprop and the permute looks like

VEC_PERM_EXPR <res_vec, {1, 1, 1, ...}, <0, 256, 257, ..., 511>}.

Now with zvl256b and LMUL8 a char vector has 256 elements but we use unsigned
char as permute mask type.

Thus, we build a tree for the mask op {0, 1, 2, 3, .., 255} (because 256
overflows the mask type).

LMUL once again testing the limits of vectors :)

I'm testing a patch.  I think the issue is here:

      mask_type
        = build_vector_type (build_nonstandard_integer_type (elem_size, 1),
                             refnelts);

where refnelts = 256 instead of 512.  For a permute we need to be able to use
indices up to refnelts * 2.

Reply via email to