Re: [PATCH] RISC-V: Add RTL pass to combine cm.popret with zero return value

Jeff Law Sat, 15 Nov 2025 20:32:14 -0800



On 11/5/25 7:50 AM, Kito Cheng wrote:

This patch implements a new RTL pass that combines "li a0, 0" and
"cm.popret" into a single "cm.popretz" instruction for the Zcmp
extension.

This optimization cannot be done during prologue/epilogue expansion
because it would cause shrink-wrapping to generate incorrect code as
documented in PR113715. The dedicated RTL pass runs after shrink-wrap
but before branch shortening, safely performing this combination.

gcc/ChangeLog:

        * config/riscv/riscv-opt-popretz.cc: New file.
        * config/riscv/riscv-passes.def: Insert pass_combine_popretz before
        pass_shorten_branches.
        * config/riscv/riscv-protos.h (make_pass_combine_popretz): New
        declaration.
        * config/riscv/t-riscv: Add riscv-opt-popretz.o build rule.
        * config.gcc (riscv*): Add riscv-opt-popretz.o to extra_objs.
        * common/config/riscv/Makefile: New file for building standalone
        tools.

gcc/testsuite/ChangeLog:

        * gcc.target/riscv/pr113715.c: New test.
        * gcc.target/riscv/rv32e_zcmp.c: Update expected output for
        test_popretz.
        * gcc.target/riscv/rv32i_zcmp.c: Likewise.

It took a little while to find my old comments on popretz. The coreissue with the earlier implementation was that it changed the RTL forthe main body of the function within the prologue/epilogue generation.Specifically it removed the the assignment to a0 and depending ondecisions made by shrink-wrapping we could have scenarios where thereturn value would be wrong.

That's the key issue we have to avoid. I don't really see an existingpass after threading the prologue/epilogue that would help in anymeaningful way with generation of popretz.

+
+   Why not use peephole2?

In general I think peephole/peephole2 is rarely the right choice.

+   ----------------------
+   An alternative approach would be to use a peephole2 pattern to perform this
+   optimization. However, between "li a0, 0" and "cm.popret", there can be
+   STACK_TIE and other instructions that make it difficult to write a robust
+   peephole pattern that handles all cases.

I could formulate this as a delay slot filling problem. I'm not sayingwe should, but I've solved similar problems using that framework in thepast. Essentially we claim that the popret has a delay slot and onlyallow it to be filled with a li a0, 0. You then detect that the slotwas filled during assembly output and emit a popretz instead. It's *gross*.

+           {
+             if (SET_SRC (def_pat) == const0_rtx)

You may want to use CONST0_RTX here so that you can capture 0.0 as adouble precision FP value. You need to know the mode to use CONST0_RTX,so you'd probably need to capture the mode from the USE statement. Iwould certainly understand if you don't think it's worth the effort.

You may also want to consider a limit on the number of instructions youexamine. You could have large BBs in play that will cause acompile-time explosion (made the same mistake myself in the past)...


So OK with or without the two minor issues above addressed.

Jeff

Re: [PATCH] RISC-V: Add RTL pass to combine cm.popret with zero return value

Reply via email to