Re: [PATCH v11 03/12] Update constant creation for BB SLP with predicated tails

Richard Biener Wed, 24 Jun 2026 04:12:36 -0700

On Mon, Jun 22, 2026 at 10:30 PM Richard Sandiford
<[email protected]> wrote:
>
> Christopher Bazley <[email protected]> writes:
> > Sorry about that. I agree that the name is not distinctive enough. The
> > original name of this function was
> > gimple_build_vector_with_zero_padding. I created it because of Richard
> > Sandiford's email here:
> > https://inbox.sourceware.org/gcc-patches/[email protected]/
> >
> > Specifically, "..add a new gimple_build interface that explicitly fills
> > with zeros, using a normal array (instead of a tree_vector_builder) for
> > the explicitly-initialised element."
> >
> > I later renamed the function as gimple_build_vector_from_elems in v4 as
> > a result of your subsequent email:
> > https://inbox.sourceware.org/gcc-patches/[email protected]/
> >
> > Specifically, "...In GIMPLE a CONSTRUCTOR node has not mentioned
> > elements zero-filled auto-magically.  So iff you assume that the target
> > can create a VLA vector with a n-element prefix (with n <= lower_bound
> > (nunits)) then you shouldn't need to do anything special."
> >
> > By that point, I had ran out of inventive names.
> >
> > Richard Sandiford wrote "If you want to do something different for BB
> > SLP then I think it makes sense that there is some difference in the way
> > that the constant is constructed."
> >
> > Do we want to do something different for BB SLP? I was happy enough with
> > the original version, which did use gimple_build_vector:
> > https://inbox.sourceware.org/gcc-patches/[email protected]/
> >
> > That original version padded the vector of elements with zeros but had
> > nelts_per_pattern == 1, therefore I suppose that the patterns were
> > repeated in the upper lanes of each SVE register.
>
> Right.  A similar problem applies to nelts_per_pattern == 2 unless you
> explicitly zero the second element in each pattern.
>
> > Based on Richard
> > Sandiford's email, he does not consider any situation to be valid other
> > than npatterns == nelts_per_pattern == encoded_nelts == 1, or
> > nelts_per_pattern == 2 and multiple_p (TYPE_VECTOR_SUBPARTS (type),
> > npatterns). Since the latest version of my patch uses
> > build_vector_from_ctor, it satisfies those constraints
>
> Not sure whether this is how you understood it, but when I said:
>
>   The only valid situations seem to be:
>
>   (1) a duplicate of a single zero, where:
>
>       npatterns == nelts_per_pattern == encoded_nelts == 1
>
>       and the only encoded value is zero
>
>   (2) the combination of:
>
>       - nelts_per_pattern == 2
>       - multiple_p (TYPE_VECTOR_SUBPARTS (type), npatterns)
>       - the second half of the encoded elements are all zeros
>
> I meant that those conditions seemed to be the ones that your code would
> need to follow in order to fill VLA vectors with zeros.  Those conditions
> certainly don't apply to all gimple_build_vector callers.
>
> So to be correct, the original patch would need to add an assert to
> gimple_build_vector that checks the above conditions before using
> constant_lower_bound.  Like you say above, without the assert:
>
>    { 1, x } nelts_per_pattern == 1, npatterns == 2
>
> would incorrectly give:
>
>    { 1, x, 0, 0, 0, 0, ... }
>
> whereas it should instead be:
>
>    { 1, x, 1, x, 1, x, ... }
>
> The current VLS CONSTRUCTOR path is supposedly correct because it
> explicitly initialises every element of the vector (i.e. it does not
> rely on zero padding of the constructor).
>
> My argument was that (1) or (2) above would not come about by chance.
> The caller would have to do something explicit to ensure that (2) is true.
> And if (1) or (2) is true, there is no need to for the tree_vector_builder.
> An array of the leading nonzero elements would be good enough, and would
> be simpler to build.
>
> That's why I thought that a different interface from the tree_vector_builder
> version would be better than adding a complicated assert to the existing
> function.


My argument is that the current API can cover the new use just fine and
we should avoid introducing new APIs for a single use which could then
well just build a properly constrained CTOR directly.  As we've legitimized
the zero-padded VLA typed CTOR variant it is IMHO sensible to have
gimple_build_vector support that case.

Did I say we have too many APIs already?

>  It's also why...
>
> > The only practical thing that my gimple_build_vector_from_elems function
> > does differently from gimple_build_vector for non-constant vectors is
> > that it does not rely on the vector type having a fixed length: instead,
> > it relies on implicit zero-filling of not-mentioned elements of a
> > CONSTRUCTOR node. Is that behaviour you would be willing to adopt in
> > gimple_build_vector?
>
> ...I don't think that this question really applies (assuming that
> you're asking about the current gimple_build_vector).  By design,
> tree_vector_builder implicitly specifies a full vector's worth of
> elements based on a possibly shorter sequence of explicitly elements.
> There are no ambiguities to be resolved.  There is nothing that is left
> to gimple_build_vector's interpretation.
>
> If we don't want tree_vector_builder semantics then we should provide
> an interface that doesn't use tree_vector_builder, which is essentially
> my argument above.

But we have that.  Build a CTOR and then an assignment.

>
> > There is a greater practical difference in the encoding of VECTOR_CST:
> >
> > The tree_vector_builder in vect_create_constant_vectors has
> > one element per pattern, therefore encoded_nelts is simply the number of
> > patterns (i.e. element values) in gimple_build_vector. If the vector is
> > constant then builder->build () uses that encoding. However,
> > builder->build () requires the number of patterns to be an integral
> > power of 2, which may not be true for the number of element values
> > supplied by BB SLP.
> >
> > encoded_nelts is also the number of element values in
> > gimple_build_vector_from_elems, but here build_vector_from_ctor sets
> > step to 2 for VLA vector types (e.g., encoding {1, 0}, {2, 0}, {3, 0},
> > {0, 0} if a VLA type with lower bound 4 existed) instead of step 1,
> > which is a different encoding from that used for the tree_vector_builder
> > in vect_create_constant_vectors (e.g., {1}, {2}, {3}). The latter would
> > be invalid without zero-padding as in my first patch version because the
> > number of patterns might not be an integral power of 2.
>
> FWIW, the nomenclature and notation here isn't the one that I'm used to.
> It sounds like you're using "step" to mean "nelts_per_pattern".  But to mem
> "step 2" means an "encoding with nelts_per_pattern == 3 and a step of 2
> between elements 1 and 2 of a pattern".  For example:
>
>   { 1, 2, 4 }, nelts_per_pattern == 3, npatterns == 1
>
> would have step 2, so that the full vector would be:
>
>   { 1, 2, 4, 6, 8, 10, 12, ... }
>
> The explicitly-encoded elements always come in vector order, so the
> encoding for your VLA example above would be:
>
>   { 1, 2, 3, 0, 0, 0, 0, 0 }, nelts_per_pattern == 2, npatterns == 4
>
> > If a step-1 encoding can be valid for a VECTOR_CST of VLA type, then it
> > seems it would be more compact than a step-2 encoding. Use of a step-2
> > encoding does not appear to ensure that the number of patterns is
> > acceptable to builder->build(), nor does it avoid the need to pad the
> > vector with explicit zeros (on the contrary, it requires it). My current
> > understanding is that the purpose of the step-2 encoding is to make
> > explicit that the scalable tail is zero-filled rather than repeating the
> > minimum prefix, but there might be another reason I am not aware of.
>
> That's right.  nelts_per_element == 1, npatterns == VF is heavily used
> in VLA loop vectorisation, where it is critical that the VF elements
> are repeated rather than padded with zeros.
>
> nelts_per_element == 2 is useful for loop vectorisation in specific
> contexts.  For example, it can be used to initialise a vector with
> an initial accumulator value while padding the rest of the vector with
> the accumulator identity value.  (The identity isn't always zero.
> For example, it's -1 for AND reductions.)
>
> Richard

Re: [PATCH v11 03/12] Update constant creation for BB SLP with predicated tails

Reply via email to