On Mon, 17 Nov 2025, Christopher Bazley wrote: > > On 14/11/2025 11:15, Richard Sandiford wrote: > > Christopher Bazley <[email protected]> writes: > >>>> Agreed. The only valid situations seem to be: > >>>> > >>>> (1) a duplicate of a single zero, where: > >>>> > >>>> npatterns == nelts_per_pattern == encoded_nelts == 1 > >>>> > >>>> and the only encoded value is zero > >>>> > >>>> (2) the combination of: > >>>> > >>>> - nelts_per_pattern == 2 > >>>> - multiple_p (TYPE_VECTOR_SUBPARTS (type), npatterns) > >>>> - the second half of the encoded elements are all zeros > >>>> > >>>> But these combinations would not come about by chance. The caller > >>>> would have to take steps to ensure that they're true. So rather > >>>> than check for these relatively complex conditions, it might > >>>> be clearer to add a new gimple_build interface that explicitly > >>>> fills with zeros, using a normal array (instead of a > >>>> tree_vector_builder) for the explicitly-initialised elements. > >>> Would a new gimple_build_*_with_zeros function remove the need for > >>> vect_create_constant_vectors to pad with zeros at all? > >>> > >>> The design of vect_create_constant_vectors seems to be heavily built > >>> around use of a tree_vector_builder. I'm a bit reluctant to do > >>> anything that would require significant refactoring of > >>> vect_create_constant_vectors, or that would require this seemingly > >>> rather ordinary case to be treated specially. > > The current code is built for the normal VLA loop case, where the > > sequence of scalar constants needs to be repeated to fill a vector. > > For example, in: > > > > for (int i = 0; i < 100; ++i) > > { > > x[i*2] += 1; > > x[i*2 + 1] += 2; > > } > > > > we need { 1, 2, 1, 2, 1, 2, ... }. > > > > We can't do that filling explicitly at compile-time because we don't > > know how many copies are needed -- that depends on the runtime vector > > length. So instead we use a tree_vector_builder that encodes { 1, 2 } > > and says that the pattern needs to be repeated to fill a vector. > > > > This also works for fixed-length loop vectorisation because, in the > > general case, filling is needed there too. We could of course do the > > filling explicitly at compile time, but it would be somewhat wasted > > effort, since the resulting constant would be canonicalised back to > > the "{ 1, 2 } repeating" encoding. > > > > If you want to do something different for BB SLP then I think it makes > > sense that there is some difference in the way that the constant is > > constructed. It doesn't need to be a big difference. tree_vector_builder > > inherits from auto_vec, so it would be possible to create a new > > gimple_build_* that takes a vec (or, better, an array_slice) and still > > share the current tree_vector_builder code in vect_create_constant_vectors. > > I have an alternative to my original solution now, which doesn't require > modification of the gimple_build_vector function. Instead, I have added a new > gimple_build_vector_with_zero_padding function. It: > > * Prepares a vector of constructor elements and find out whether all of the > element values are constant. > > * If all element values are constant then it returns a new VECTOR_CST node. > Any elements for which no value is supplied will be zero. > > * Otherwise, it builds a constructor for only those element values that the > caller provided, then assigns the result of that constructor to a temporary > object.
In GIMPLE a CONSTRUCTOR node has not mentioned elements zero-filled auto-magically. So iff you assume that the target can create a VLA vector with a n-element prefix (with n <= lower_bound (nunits)) then you shouldn't need to do anything special. Of course this requires the targets vec_init to do the heavy lifting as IIRC there's no fallback in RTL expansion for VLA vector CONSTRUCTORs. But if code quality is awful why would we want to do this at all? In general I'd expect N operations to construct a vector with N non-zero leading elements. Richard. > * If there are no implicitly-zero trailing elements then it returns the value > of the temporary object. > > * Otherwise, it builds a mask that exposes only those lanes of the destination > vector type for which the caller provided values... > > * ... then copies unmasked elements from the temporary object to the > destination vector and assigns zero to masked elements. > > (The temporary object is needed because a non-constant constructor isn’t valid > for use in a ternary operation.) > > Unfortunately, this approach currently produces less efficient output, for > example: > > ptrue p15.b, vl16 ; *** :o( > .... > ptrue p7.b, vl3 > .... > movi d25, #0 ; *** :o( > .... > ld1b z1.b, p6/z, [sp] > sel z25.b, p15, z1.b, z25.b ; *** :o( > add z28.b, z28.b, z25.b > st1b z28.b, p7, [x1] > > -- Richard Biener <[email protected]> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Jochen Jaser, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
