On Fri, Dec 27, 2024 at 3:19 AM Robin Dapp <[email protected]> wrote:
>
> Thanks for the helpful suggestion. The attached v2 patch tries to implement
> it.
>
> It was bootstrapped and regtested on x86, aarch64 and Power 10.
> Also regtested on rv64gcv_zvl512b.
>
> Those are all little-endian (sub)targets, though, so it would certainly be
> helpful if you could run additional aarch64 big-endian tests. I haven't seen
> any scan-assembler failures so far.
>
> Regards
> Robin
>
>
> [PATCH v2] varasm: Use native_encode_rtx for constant vectors.
>
> optimize_constant_pool hashes vector masks by native_encode_rtx and
> merges identically hashed values in the constant pool. Afterwards the
> optimized values are written in output_constant_pool_2.
>
> However, native_encode_rtx and output_constant_pool_2 disagree in their
> encoding of vector masks: native_encode_rtx does not pad with zeroes
> while output_constant_pool_2 implicitly does.
>
> In RVV's shuffle-evenodd-run.c there are two masks
> (a) "0101" for V4BI
> (b) "01010101" for V8BI and
> that have the same representation/encoding ("1010101") in native_encode_rtx.
> output_constant_pool_2 uses "101" for (a) and "1010101" for (b).
>
> Now, optimize_constant_pool might happen to merge both masks using
> (a) as representative. Then, output_constant_pool_2 will output "1010"
> which is only valid for the second mask as the implicit zero padding
> doesn't agree with (b).
> (b)'s "1010101" works for both masks as a V4BI load will ignore the last four
> padding bits.
>
> This patch makes output_constant_pool_2 use native_encode_rtx so both
> functions will agree on an encoding and output the correct constant.
>
> gcc/ChangeLog:
>
> * varasm.cc (output_constant_pool_2): Use native_encode_rtx for
> building the memory image of a const vector mask.
> ---
> gcc/varasm.cc | 40 +++++++++++++---------------------------
> 1 file changed, 13 insertions(+), 27 deletions(-)
>
> diff --git a/gcc/varasm.cc b/gcc/varasm.cc
> index 0068ec2ce4d..9e86efcd178 100644
> --- a/gcc/varasm.cc
> +++ b/gcc/varasm.cc
> @@ -4301,34 +4301,20 @@ output_constant_pool_2 (fixed_size_mode mode, rtx x,
> unsigned int align)
> {
> gcc_assert (GET_CODE (x) == CONST_VECTOR);
>
> - /* Pick the smallest integer mode that contains at least one
> - whole element. Often this is byte_mode and contains more
> - than one element. */
> - unsigned int nelts = GET_MODE_NUNITS (mode);
> - unsigned int elt_bits = GET_MODE_PRECISION (mode) / nelts;
> - unsigned int int_bits = MAX (elt_bits, BITS_PER_UNIT);
> - scalar_int_mode int_mode = int_mode_for_size (int_bits, 0).require ();
> - unsigned int mask = GET_MODE_MASK (GET_MODE_INNER (mode));
> -
> - /* We allow GET_MODE_PRECISION (mode) <= GET_MODE_BITSIZE (mode) but
> - only properly handle cases where the difference is less than a
> - byte. */
> - gcc_assert (GET_MODE_BITSIZE (mode) - GET_MODE_PRECISION (mode) <
> - BITS_PER_UNIT);
> -
> - /* Build the constant up one integer at a time. */
> - unsigned int elts_per_int = int_bits / elt_bits;
> - for (unsigned int i = 0; i < nelts; i += elts_per_int)
> + auto_vec<target_unit, 128> buffer;
Let me ask the stupid question, where did 128 magic # come from? I am
suspecting it is the most common vector size in bytes but I am not
100% sure. Maybe a comment would be a good idea. Or should it be 512
for the "max" known vector size on x86_64? (256 exists for AVX, 512
exists but AVX512 is not as common as AVX).
> + buffer.truncate (0);
I don't think you need a truncate here since the vect size is originally 0.
Thanks,
Andrew
> + buffer.reserve (GET_MODE_SIZE (mode));
> +
> + bool ok = native_encode_rtx (mode, x, buffer, 0, GET_MODE_SIZE
> (mode));
> + gcc_assert (ok);
> +
> + for (unsigned i = 0;
> + i < GET_MODE_SIZE (mode) / GET_MODE_SIZE (byte_mode);
> + i++)
> {
> - unsigned HOST_WIDE_INT value = 0;
> - unsigned int limit = MIN (nelts - i, elts_per_int);
> - for (unsigned int j = 0; j < limit; ++j)
> - {
> - auto elt = INTVAL (CONST_VECTOR_ELT (x, i + j));
> - value |= (elt & mask) << (j * elt_bits);
> - }
> - output_constant_pool_2 (int_mode, gen_int_mode (value, int_mode),
> - i != 0 ? MIN (align, int_bits) : align);
> + unsigned HOST_WIDE_INT value = buffer[i];
> + output_constant_pool_2 (byte_mode, gen_int_mode (value,
> byte_mode),
> + i == 0 ? 0 : 1);
> }
> break;
> }
> --
> 2.47.1
>