On Tue, Jun 3, 2025 at 2:59 PM H.J. Lu <hjl.to...@gmail.com> wrote:
>
> Extend the remove_redundant_vector pass to handle vector broadcasts from
> constant and variable scalars.  When broadcasting from constants and
> function arguments, we can place a single widest vector broadcast at
> entry of the nearest common dominator for basic blocks with all uses
> since constants and function arguments aren't changed.  For broadcast
> from variables with a single definition, the single definition is
> replaced with the widest broadcast.
>
> gcc/
>
> PR target/92080
> * config/i386/i386-expand.cc (ix86_expand_call): Set
> recursive_function to true for recursive call.
> * config/i386/i386-features.cc (ix86_place_single_vector_set):
> Add an argument for inner scalar, default to nullptr.  Set the
> source from inner scalar if not nullptr.
> (ix86_get_vector_load_mode): Add an argument for scalar mode and
> handle integer and float scalar modes.
> (replace_vector_const): Add an argument for scalar mode and pass
> it to ix86_get_vector_load_mode.
> (redundant_load_kind): New.
> (redundant_load): Likewise.
> (ix86_broadcast_inner): Likewise.
> (remove_redundant_vector_load): Also support const0_rtx and
> constm1_rtx broadcasts.  Handle vector broadcasts from constant
> and variable scalars.
> * config/i386/i386.h (machine_function): Add recursive_function.
>
> gcc/testsuite/
>
> * gcc.target/i386/keylocker-aesdecwide128kl.c: Updated to expect
> movdqa instead pxor.
> * gcc.target/i386/keylocker-aesdecwide256kl.c: Likewise.
> * gcc.target/i386/keylocker-aesencwide128kl.c: Likewise.
> * gcc.target/i386/keylocker-aesencwide256kl.c: Likewise.
> * gcc.target/i386/pr92080-4.c: New test.
> * gcc.target/i386/pr92080-5.c: Likewise.
> * gcc.target/i386/pr92080-6.c: Likewise.
> * gcc.target/i386/pr92080-7.c: Likewise.
> * gcc.target/i386/pr92080-8.c: Likewise.
> * gcc.target/i386/pr92080-9.c: Likewise.
> * gcc.target/i386/pr92080-10.c: Likewise.
> * gcc.target/i386/pr92080-11.c: Likewise.
> * gcc.target/i386/pr92080-12.c: Likewise.
> * gcc.target/i386/pr92080-13.c: Likewise.
> * gcc.target/i386/pr92080-14.c: Likewise.
> * gcc.target/i386/pr92080-15.c: Likewise.
> * gcc.target/i386/pr92080-16.c: Likewise.

>+  machine_mode mode = VOIDmode;
>+  fixed_size_mode candidate;
>+  FOR_EACH_MODE_IN_CLASS (mode, vklass)
>+    if (is_a<fixed_size_mode> (mode, &candidate)
>+       && GET_MODE_INNER (candidate) == scalar_mode
>+       && GET_MODE_SIZE (candidate) == size)
>+      return mode;
>+
>+  gcc_unreachable ();

Can we just use default_vectorize_related_mode to get the wanted mode,
or reuse the code in it?
1591/* The default implementation of TARGET_VECTORIZE_RELATED_MODE.
*/
1592
1593opt_machine_mode
1594default_vectorize_related_mode (machine_mode vector_mode,
1595                                scalar_mode element_mode,
1596                                poly_uint64 nunits)
1597{
1598  machine_mode result_mode;
1599  if ((maybe_ne (nunits, 0U)
1600       || multiple_p (GET_MODE_SIZE (vector_mode),
1601                      GET_MODE_SIZE (element_mode), &nunits))
1602      && mode_for_vector (element_mode, nunits).exists
(&result_mode)
1603      && VECTOR_MODE_P (result_mode)
1604      && targetm.vector_mode_supported_p (result_mode))
1605    return result_mode;
1606
1607  return opt_machine_mode ();
1608}

> +  else if (CONST_VECTOR_P (op))
>+    {
>+      rtx first = XVECEXP (op, 0, 0);
>+      for (int i = 1; i < nunits; ++i)
>+       {
>+         rtx tmp = XVECEXP (op, 0, i);
>+         /* Vector duplicate value.  */
>+         if (!rtx_equal_p (tmp, first))
>+           return nullptr;
>+       }
>+      if (!CONSTANT_P (first))
>+       return nullptr;
Is it really needed? Do we have a case where CONST_VECTOR_P has a
non-constant component?
Also I assume that allsame && CONST_VECTOR_P is already handled in
ix86_expand_vector_init, so do we really need to handle it here?
17725  /* If all values are identical, broadcast the value.  */
17726  if (all_same
17727      && ix86_expand_vector_init_duplicate (mmx_ok, mode, target,
17728                                            XVECEXP (vals, 0,
0)))
17729    return;


>+         /* Check if there is a matching redundant vector load.   */
>+         bool matched = false;
>+         FOR_EACH_VEC_ELT (loads, i, load)
It's expensive. Can we try with a hash like cse_insn?
>+           if (load->val
>+               && load->kind == kind
>+               && load->mode == scalar_mode
>+               /* Since CONST_INT load doesn't need memory, it must
>+                  be in the same basic block if it is in a recursive
>+                  call.  */
This part is a bit tricky, It's used to avoid some regression which I
guess just exposes some latent issue?
And I didn't see any reason why recursive_call_p  needs to be excluded
for CONST_INT load and same bb.
>+               && (!recursive_call_p
>+                   || load->bb == bb
>+                   || !(CONST_INT_P (load->val)
>+                        && load->kind == LOAD_VECTOR))
>+               && rtx_equal_p (load->val, val))
>+             {




>
> --
> H.J.



-- 
BR,
Hongtao

Reply via email to