https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79912

--- Comment #10 from mpf at gcc dot gnu.org ---
(In reply to Kito Cheng from comment #8)
> [1]
> diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
> index 89567f7..148967b 100644
> --- a/gcc/config/riscv/riscv.c
> +++ b/gcc/config/riscv/riscv.c
> @@ -3581,10 +3581,6 @@ riscv_hard_regno_mode_ok_p (unsigned int regno, enum
> machine_mode mode)
>        if (!FP_REG_P (regno + nregs - 1))
>         return false;
>  
> -      if (GET_MODE_CLASS (mode) != MODE_FLOAT
> -         && GET_MODE_CLASS (mode) != MODE_COMPLEX_FLOAT)
> -       return false;
> -
>        /* Only use callee-saved registers if a potential callee is guaranteed
>          to spill the requisite width.  */
>        if (GET_MODE_UNIT_SIZE (mode) > UNITS_PER_FP_REG
> @@ -3634,7 +3630,7 @@ riscv_class_max_nregs (reg_class_t rclass, enum
> machine_mode mode)
>  static reg_class_t
>  riscv_preferred_reload_class (rtx x ATTRIBUTE_UNUSED, reg_class_t rclass)
>  {
> -  return reg_class_subset_p (FP_REGS, rclass) ? FP_REGS :
> +  return reg_class_subset_p (FP_REGS, rclass) && TARGET_HARD_FLOAT ?
> FP_REGS :
>          reg_class_subset_p (GR_REGS, rclass) ? GR_REGS :
>          rclass;
>  }

I don't think you want to do this really.  Allowing integer modes in FPRs can
make a real mess and lead to extra cost moving to and from FPRs which can be
slow as well as using additional register banks that would not normally be
required. I.e. consider code that only uses integer types hitting the FPU
registers.  Assuming the FPRs can be managed in a lazy context fashion then
this means you can introduce additional context switch overhead for
non-floating-point processes which is additional waste.

Doesn't my original fix work for you? It should just lead to the loads being a
different width but not using FPRs; I guess it could break something else
though.

The problem here is that WORD_REGISTER_OPERATIONS allows a subreg and a reg of
the same hard-register to be used without need for sign/zero extension but
instead relying on the LOAD_EXTEND_OP rules. The 'true' value is that of the
inner mode and there can be loads/stores in that inner-mode elsewhere that
expect the full width of the inner-mode to be valid in memory. If you do an
output reload in the outer-mode and only store outer-mode-width in memory then
any inner mode consumer will get junk in the upper bits.  There are of course
occasions where this does not matter. In particular that means input reloads
could be done in the outer-mode (when that is narrower) as long as and output
reloads for the same instruction are done in the inner-mode i.e. keeping memory
content consistent but reducing the size of loads. Doing that kind of
optimisation and getting it correct is far too invasive for stage 4 and the
aspects of the current behavior are necessary for correctness.

I recommend that on balance for all targets the current behavior is a
reasonable compromise. I have said elsewhere that I am happy to continue
working in this area and would welcome any further help to evaluate the effects
of further work. EricB has offered his assistance and any additional help would
also be good as this issue affects targets in different ways.

Reply via email to