On 11/5/25 7:50 AM, Kito Cheng wrote:
This patch implements a new RTL pass that combines "li a0, 0" and
"cm.popret" into a single "cm.popretz" instruction for the Zcmp
extension.

This optimization cannot be done during prologue/epilogue expansion
because it would cause shrink-wrapping to generate incorrect code as
documented in PR113715. The dedicated RTL pass runs after shrink-wrap
but before branch shortening, safely performing this combination.

gcc/ChangeLog:

        * config/riscv/riscv-opt-popretz.cc: New file.
        * config/riscv/riscv-passes.def: Insert pass_combine_popretz before
        pass_shorten_branches.
        * config/riscv/riscv-protos.h (make_pass_combine_popretz): New
        declaration.
        * config/riscv/t-riscv: Add riscv-opt-popretz.o build rule.
        * config.gcc (riscv*): Add riscv-opt-popretz.o to extra_objs.
        * common/config/riscv/Makefile: New file for building standalone
        tools.

gcc/testsuite/ChangeLog:

        * gcc.target/riscv/pr113715.c: New test.
        * gcc.target/riscv/rv32e_zcmp.c: Update expected output for
        test_popretz.
        * gcc.target/riscv/rv32i_zcmp.c: Likewise.
It took a little while to find my old comments on popretz. The core issue with the earlier implementation was that it changed the RTL for the main body of the function within the prologue/epilogue generation. Specifically it removed the the assignment to a0 and depending on decisions made by shrink-wrapping we could have scenarios where the return value would be wrong.

That's the key issue we have to avoid. I don't really see an existing pass after threading the prologue/epilogue that would help in any meaningful way with generation of popretz.





+
+   Why not use peephole2?
In general I think peephole/peephole2 is rarely the right choice.


+   ----------------------
+   An alternative approach would be to use a peephole2 pattern to perform this
+   optimization. However, between "li a0, 0" and "cm.popret", there can be
+   STACK_TIE and other instructions that make it difficult to write a robust
+   peephole pattern that handles all cases.
I could formulate this as a delay slot filling problem. I'm not saying we should, but I've solved similar problems using that framework in the past. Essentially we claim that the popret has a delay slot and only allow it to be filled with a li a0, 0. You then detect that the slot was filled during assembly output and emit a popretz instead. It's *gross*.


+           {
+             if (SET_SRC (def_pat) == const0_rtx)
You may want to use CONST0_RTX here so that you can capture 0.0 as a double precision FP value. You need to know the mode to use CONST0_RTX, so you'd probably need to capture the mode from the USE statement. I would certainly understand if you don't think it's worth the effort.

You may also want to consider a limit on the number of instructions you examine. You could have large BBs in play that will cause a compile-time explosion (made the same mistake myself in the past)...

So OK with or without the two minor issues above addressed.

Jeff

Reply via email to