Christopher Bazley <[email protected]> writes:
> Sorry about that. I agree that the name is not distinctive enough. The 
> original name of this function was 
> gimple_build_vector_with_zero_padding. I created it because of Richard 
> Sandiford's email here:
> https://inbox.sourceware.org/gcc-patches/[email protected]/
>
> Specifically, "..add a new gimple_build interface that explicitly fills 
> with zeros, using a normal array (instead of a tree_vector_builder) for 
> the explicitly-initialised element."
>
> I later renamed the function as gimple_build_vector_from_elems in v4 as 
> a result of your subsequent email:
> https://inbox.sourceware.org/gcc-patches/[email protected]/
>
> Specifically, "...In GIMPLE a CONSTRUCTOR node has not mentioned 
> elements zero-filled auto-magically.  So iff you assume that the target 
> can create a VLA vector with a n-element prefix (with n <= lower_bound 
> (nunits)) then you shouldn't need to do anything special."
>
> By that point, I had ran out of inventive names.
>
> Richard Sandiford wrote "If you want to do something different for BB 
> SLP then I think it makes sense that there is some difference in the way 
> that the constant is constructed."
>
> Do we want to do something different for BB SLP? I was happy enough with 
> the original version, which did use gimple_build_vector:
> https://inbox.sourceware.org/gcc-patches/[email protected]/
>
> That original version padded the vector of elements with zeros but had 
> nelts_per_pattern == 1, therefore I suppose that the patterns were 
> repeated in the upper lanes of each SVE register.

Right.  A similar problem applies to nelts_per_pattern == 2 unless you
explicitly zero the second element in each pattern.

> Based on Richard 
> Sandiford's email, he does not consider any situation to be valid other 
> than npatterns == nelts_per_pattern == encoded_nelts == 1, or 
> nelts_per_pattern == 2 and multiple_p (TYPE_VECTOR_SUBPARTS (type), 
> npatterns). Since the latest version of my patch uses 
> build_vector_from_ctor, it satisfies those constraints

Not sure whether this is how you understood it, but when I said:

  The only valid situations seem to be:

  (1) a duplicate of a single zero, where:

      npatterns == nelts_per_pattern == encoded_nelts == 1

      and the only encoded value is zero

  (2) the combination of:

      - nelts_per_pattern == 2
      - multiple_p (TYPE_VECTOR_SUBPARTS (type), npatterns)
      - the second half of the encoded elements are all zeros

I meant that those conditions seemed to be the ones that your code would
need to follow in order to fill VLA vectors with zeros.  Those conditions
certainly don't apply to all gimple_build_vector callers.

So to be correct, the original patch would need to add an assert to
gimple_build_vector that checks the above conditions before using
constant_lower_bound.  Like you say above, without the assert:

   { 1, x } nelts_per_pattern == 1, npatterns == 2

would incorrectly give:

   { 1, x, 0, 0, 0, 0, ... }

whereas it should instead be:

   { 1, x, 1, x, 1, x, ... }

The current VLS CONSTRUCTOR path is supposedly correct because it
explicitly initialises every element of the vector (i.e. it does not
rely on zero padding of the constructor).

My argument was that (1) or (2) above would not come about by chance.
The caller would have to do something explicit to ensure that (2) is true.
And if (1) or (2) is true, there is no need to for the tree_vector_builder.
An array of the leading nonzero elements would be good enough, and would
be simpler to build.

That's why I thought that a different interface from the tree_vector_builder
version would be better than adding a complicated assert to the existing
function.  It's also why...

> The only practical thing that my gimple_build_vector_from_elems function 
> does differently from gimple_build_vector for non-constant vectors is 
> that it does not rely on the vector type having a fixed length: instead, 
> it relies on implicit zero-filling of not-mentioned elements of a 
> CONSTRUCTOR node. Is that behaviour you would be willing to adopt in 
> gimple_build_vector?

...I don't think that this question really applies (assuming that
you're asking about the current gimple_build_vector).  By design,
tree_vector_builder implicitly specifies a full vector's worth of
elements based on a possibly shorter sequence of explicitly elements.
There are no ambiguities to be resolved.  There is nothing that is left
to gimple_build_vector's interpretation.

If we don't want tree_vector_builder semantics then we should provide
an interface that doesn't use tree_vector_builder, which is essentially
my argument above.

> There is a greater practical difference in the encoding of VECTOR_CST:
>
> The tree_vector_builder in vect_create_constant_vectors has
> one element per pattern, therefore encoded_nelts is simply the number of 
> patterns (i.e. element values) in gimple_build_vector. If the vector is 
> constant then builder->build () uses that encoding. However, 
> builder->build () requires the number of patterns to be an integral 
> power of 2, which may not be true for the number of element values 
> supplied by BB SLP.
>
> encoded_nelts is also the number of element values in 
> gimple_build_vector_from_elems, but here build_vector_from_ctor sets 
> step to 2 for VLA vector types (e.g., encoding {1, 0}, {2, 0}, {3, 0}, 
> {0, 0} if a VLA type with lower bound 4 existed) instead of step 1, 
> which is a different encoding from that used for the tree_vector_builder 
> in vect_create_constant_vectors (e.g., {1}, {2}, {3}). The latter would 
> be invalid without zero-padding as in my first patch version because the 
> number of patterns might not be an integral power of 2.

FWIW, the nomenclature and notation here isn't the one that I'm used to.
It sounds like you're using "step" to mean "nelts_per_pattern".  But to mem
"step 2" means an "encoding with nelts_per_pattern == 3 and a step of 2
between elements 1 and 2 of a pattern".  For example:

  { 1, 2, 4 }, nelts_per_pattern == 3, npatterns == 1

would have step 2, so that the full vector would be:

  { 1, 2, 4, 6, 8, 10, 12, ... }

The explicitly-encoded elements always come in vector order, so the
encoding for your VLA example above would be:

  { 1, 2, 3, 0, 0, 0, 0, 0 }, nelts_per_pattern == 2, npatterns == 4

> If a step-1 encoding can be valid for a VECTOR_CST of VLA type, then it 
> seems it would be more compact than a step-2 encoding. Use of a step-2 
> encoding does not appear to ensure that the number of patterns is 
> acceptable to builder->build(), nor does it avoid the need to pad the 
> vector with explicit zeros (on the contrary, it requires it). My current 
> understanding is that the purpose of the step-2 encoding is to make 
> explicit that the scalable tail is zero-filled rather than repeating the 
> minimum prefix, but there might be another reason I am not aware of.

That's right.  nelts_per_element == 1, npatterns == VF is heavily used
in VLA loop vectorisation, where it is critical that the VF elements
are repeated rather than padded with zeros.

nelts_per_element == 2 is useful for loop vectorisation in specific
contexts.  For example, it can be used to initialise a vector with
an initial accumulator value while padding the rest of the vector with
the accumulator identity value.  (The identity isn't always zero.
For example, it's -1 for AND reductions.)

Richard

Reply via email to