Richard Biener <rguent...@suse.de> writes: > On Wed, 14 Feb 2024, Richard Sandiford wrote: > >> Richard Biener <rguent...@suse.de> writes: >> > The following avoids accessing out-of-bound vector elements when >> > native encoding a boolean vector with sub-BITS_PER_UNIT precision >> > elements. The error was basing the number of elements to extract >> > on the rounded up total byte size involved and the patch bases >> > everything on the total number of elements to extract instead. >> >> It's too long ago to be certain, but I think this was a deliberate choice. >> The point of the new vector constant encoding is that it can give an >> allegedly sensible value for any given index, even out-of-range ones. >> >> Since the padding bits are undefined, we should in principle have a free >> choice of what to use. And for VLA, it's often better to continue the >> existing pattern rather than force to zero. >> >> I don't strongly object to changing it. I think we should be careful >> about relying on zeroing for correctness though. The bits are in principle >> undefined and we can't rely on reading zeros from equivalent memory or >> register values. > > The main motivation for a change here is to allow catching out-of-bound > indices again for VECTOR_CST_ELT, at least for constant nunits because > it might be a programming error like fat-fingering the index. I do > think it's a regression that we no longer catch those. > > It's probably also a bit non-obvious how an encoding continues and > there might be DImode masks that can be represented by a > zero-extended QImode immediate but "continued" it would require > a larger immediate. > > The change also effectively only changes something for 1 byte > encodings since nunits is a power of two and so is the element > size in bits.
Yeah, but even there, there's an argument that all-1s (0xff) is a more obvious value for an all-1s mask. > A patch restoring the VECTOR_CST_ELT checking might be the > following > > diff --git a/gcc/tree.cc b/gcc/tree.cc > index 046a558d1b0..4c9b05167fd 100644 > --- a/gcc/tree.cc > +++ b/gcc/tree.cc > @@ -10325,6 +10325,9 @@ vector_cst_elt (const_tree t, unsigned int i) > if (i < encoded_nelts) > return VECTOR_CST_ENCODED_ELT (t, i); > > + /* Catch out-of-bound element accesses. */ > + gcc_checking_assert (maybe_gt (VECTOR_CST_NELTS (t), i)); > + > /* If there are no steps, the final encoded value is the right one. */ > if (!VECTOR_CST_STEPPED_P (t)) > { > > but it triggers quite a bit via const_binop for, for example > > #2 0x00000000011c1506 in const_binop (code=PLUS_EXPR, > arg1=<vector_cst 0x7ffff6f40b40>, arg2=<vector_cst 0x7ffff6e731e0>) > (gdb) p debug_generic_expr (arg1) > { 12, 13, 14, 15 } > $5 = void > (gdb) p debug_generic_expr (arg2) > { -2, -2, -2, -3 } > (gdb) p count > $4 = 6 > (gdb) l > 1711 if (!elts.new_binary_operation (type, arg1, arg2, > step_ok_p)) > 1712 return NULL_TREE; > 1713 unsigned int count = elts.encoded_nelts (); > 1714 for (unsigned int i = 0; i < count; ++i) > 1715 { > 1716 tree elem1 = VECTOR_CST_ELT (arg1, i); > 1717 tree elem2 = VECTOR_CST_ELT (arg2, i); > 1718 > 1719 tree elt = const_binop (code, elem1, elem2); > > this seems like an error to me - why would we, for fixed-size > vectors and for PLUS ever create a vector encoding with 6 elements?! > That seems at least inefficient to me? It's a case of picking your poison. On the other side, operating individually on each element of a V64QI is inefficient when the representation says up-front that all elements are equal. Fundemantally, operations on VLA vectors are treated as functions that map patterns to patterns. The number of elements that are consumed isn't really relevant to the function itself. The VLA folders therefore rely on being to read an element from a pattern even if the index is outside TREE_VECTOR_SUBPARTS. There were two reasons for using VLA paths for VLS vectors. One I mentioned above: it saves time when all elements are equal, or have a similarly compact representation. The other is that it makes VLA less special and ensures that the code gets more testing. Maybe one compromise between that and the assert would be: (1) enforce the assert only for VLS and (2) add new checks to ensure that a VLA-friendly operation will never read out-of-bounds for VLS vectors But I think this would be awkward. E.g. we now build reversed vectors as a 3-element series N-1, N-2, N-3. It would be nice not to have to special-case N==2 by suppressing N-3. And the condition for (2) might not always be obvious. Another option would be to have special accessors that are allowed to read out of bounds, and add the assert (for both VLA & VLS) to VECTOR_CST_ELT. It might take a while to root out all the places that need to change though. Thanks, Richard