Re: [PATCH PR95254] aarch64: gcc generate inefficient code with fixed sve vector length

Richard Sandiford Tue, 26 May 2020 08:58:51 -0700

Sorry for the slow reply, was off for a few days.

I think the new code ought to happen earlier in emit_move_insn, before:

  if (CONSTANT_P (y))
    {

That way, all the canonicalisation happens on the mode we actually
want the move to have.

"Yangfei (Felix)" <felix.y...@huawei.com> writes:
> diff --git a/gcc/expr.c b/gcc/expr.c
> index dfbeae71518..4442fb83367 100644
> --- a/gcc/expr.c
> +++ b/gcc/expr.c
> @@ -3852,6 +3852,62 @@ emit_move_insn (rtx x, rtx y)
>  
>    gcc_assert (mode != BLKmode);
>  
> +  rtx x_inner = NULL_RTX;
> +  rtx y_inner = NULL_RTX;
> +  machine_mode x_inner_mode, y_inner_mode;
> +
> +  if (SUBREG_P (x)
> +      && REG_P (SUBREG_REG (x))
> +      && known_eq (SUBREG_BYTE (x), 0))
> +    {
> +      x_inner = SUBREG_REG (x);
> +      x_inner_mode = GET_MODE (x_inner);
> +    }
> +  if (SUBREG_P (y)
> +      && REG_P (SUBREG_REG (y))
> +      && known_eq (SUBREG_BYTE (y), 0))
> +    {
> +      y_inner = SUBREG_REG (y);
> +      y_inner_mode = GET_MODE (y_inner);
> +    }

The later code is only interested in SUBREG_REGs that are the same size
as "mode", so I think it would make sense to check that in the "if"s
above instead of checking SUBREG_BYTE.  (SUBREG_BYTE is always zero
when the modes are the same size, but the reverse is not true.)

It might also be better to avoid [xy]_inner_mode and just use GET_MODE
where necessary.

It would be good to have a block comment above the code to explain
what we're doing.

> +  if (x_inner != NULL_RTX
> +      && y_inner != NULL_RTX
> +      && x_inner_mode == y_inner_mode
> +      && known_eq (GET_MODE_SIZE (x_inner_mode), GET_MODE_SIZE (mode))
> +      && ! targetm.can_change_mode_class (x_inner_mode, mode, ALL_REGS))
> +    {
> +      x = x_inner;
> +      y = y_inner;
> +    }
> +  else if (x_inner != NULL_RTX && CONSTANT_P (y)

Formatting nit: one subcondition per line when the condition spans
multiple lines.

> +        && known_eq (GET_MODE_SIZE (x_inner_mode), GET_MODE_SIZE (mode))
> +        && ! targetm.can_change_mode_class (x_inner_mode, mode, ALL_REGS)
> +        && targetm.legitimate_constant_p (x_inner_mode, y))

This call isn't valid, since the mode has to match the rtx.  ("y" still
has mode "mode" at this point.)  I think instead we should just do:

           && (y_inner = simplify_subreg (GET_MODE (x_inner), y, mode, 0))

to convert the constant, and use it if the result is nonnull.
The existing CONSTANT_P emit_move_insn code will handle cases
in which the new constant isn't legitimate.

> +
> +    {
> +      x = x_inner;
> +    }
> +  else if (x_inner != NULL_RTX && MEM_P (y)
> +        && known_eq (GET_MODE_SIZE (x_inner_mode), GET_MODE_SIZE (mode))
> +        && ! targetm.can_change_mode_class (x_inner_mode, mode, ALL_REGS)
> +        && (! targetm.slow_unaligned_access (x_inner_mode, MEM_ALIGN (y))
> +            || MEM_ALIGN (y) >= GET_MODE_ALIGNMENT (x_inner_mode)))

What is the last condition protecting against?  Seems worth a comment.

Thanks,
Richard

Re: [PATCH PR95254] aarch64: gcc generate inefficient code with fixed sve vector length

Reply via email to