https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79912
--- Comment #10 from mpf at gcc dot gnu.org --- (In reply to Kito Cheng from comment #8) > [1] > diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c > index 89567f7..148967b 100644 > --- a/gcc/config/riscv/riscv.c > +++ b/gcc/config/riscv/riscv.c > @@ -3581,10 +3581,6 @@ riscv_hard_regno_mode_ok_p (unsigned int regno, enum > machine_mode mode) > if (!FP_REG_P (regno + nregs - 1)) > return false; > > - if (GET_MODE_CLASS (mode) != MODE_FLOAT > - && GET_MODE_CLASS (mode) != MODE_COMPLEX_FLOAT) > - return false; > - > /* Only use callee-saved registers if a potential callee is guaranteed > to spill the requisite width. */ > if (GET_MODE_UNIT_SIZE (mode) > UNITS_PER_FP_REG > @@ -3634,7 +3630,7 @@ riscv_class_max_nregs (reg_class_t rclass, enum > machine_mode mode) > static reg_class_t > riscv_preferred_reload_class (rtx x ATTRIBUTE_UNUSED, reg_class_t rclass) > { > - return reg_class_subset_p (FP_REGS, rclass) ? FP_REGS : > + return reg_class_subset_p (FP_REGS, rclass) && TARGET_HARD_FLOAT ? > FP_REGS : > reg_class_subset_p (GR_REGS, rclass) ? GR_REGS : > rclass; > } I don't think you want to do this really. Allowing integer modes in FPRs can make a real mess and lead to extra cost moving to and from FPRs which can be slow as well as using additional register banks that would not normally be required. I.e. consider code that only uses integer types hitting the FPU registers. Assuming the FPRs can be managed in a lazy context fashion then this means you can introduce additional context switch overhead for non-floating-point processes which is additional waste. Doesn't my original fix work for you? It should just lead to the loads being a different width but not using FPRs; I guess it could break something else though. The problem here is that WORD_REGISTER_OPERATIONS allows a subreg and a reg of the same hard-register to be used without need for sign/zero extension but instead relying on the LOAD_EXTEND_OP rules. The 'true' value is that of the inner mode and there can be loads/stores in that inner-mode elsewhere that expect the full width of the inner-mode to be valid in memory. If you do an output reload in the outer-mode and only store outer-mode-width in memory then any inner mode consumer will get junk in the upper bits. There are of course occasions where this does not matter. In particular that means input reloads could be done in the outer-mode (when that is narrower) as long as and output reloads for the same instruction are done in the inner-mode i.e. keeping memory content consistent but reducing the size of loads. Doing that kind of optimisation and getting it correct is far too invasive for stage 4 and the aspects of the current behavior are necessary for correctness. I recommend that on balance for all targets the current behavior is a reasonable compromise. I have said elsewhere that I am happy to continue working in this area and would welcome any further help to evaluate the effects of further work. EricB has offered his assistance and any additional help would also be good as this issue affects targets in different ways.