Before implementing Zcmp, I did some optimizations and restructures to 
save-restore.
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=a5b2a3bff8152aa34408d8ce40add82f4d22ff87
https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=60524be1e3929d83e15fceac6e2aa053c8a6fb20
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=a782346757c54a5a3cfb9f416a7ebe3554a617d7

Then Zcmp can share the same logic as save-restore in stack allocation: 
pre-allocation
by cm.push, step 1 and step 2. 

please be noted cm.push pushes ra, s0-s11 in reverse order than what 
save-restore does.
So adaption has been done in .cfi directives in my patch. A discussion be found 
here: 
https://github.com/riscv/riscv-code-size-reduction/issues/182

Weeks before, Jiawei also posted Zcmp in 
https://gcc.gnu.org/pipermail/gcc-patches/2023-April/615287.html. 
[PATCH 0/5] RISC-V: Support ZC* extensions.   Jiawei
[PATCH 1/5] RISC-V: Minimal support for ZC extensions.   Jiawei
[PATCH 2/5] RISC-V: Enable compressible features when use ZC* extensions.   
Jiawei
[PATCH 3/5] RISC-V: Add ZC* test for march args being passed.   Jiawei
[PATCH 4/5] RISC-V: Add Zcmp extension supports.   Jiawei
[PATCH 5/5] RISC-V: Add ZCMP push/pop testcases.   Jiawei

I tested his codes and observed some issues in [PATCH 4/5].
So I plan to post my codes as an alternative of Jiawei's [PATCH 4/5].

My Zcmp switch codes are almost same as Jiawei's.
So i avoid repeating them in my patch series, 
but please pick up Jiawei's [PATCH 1/5] before picking up my patch series.


Here're some comparison. 
Result left side is REF from Jiawei and right side is from my patch.

1. REF fails to generate zcmp insns.
TC rv32e_zcmp.c
foo:                                                           foo:             
                                             
        addi    sp,sp,-12                                              cm.push 
{ra}, -16                                     
        sw      ra,8(sp)                                                        
                                             
        call    f1                                                     call    
f1                                            
        lw      ra,8(sp)                                               cm.pop  
{ra}, 16                                      
        addi    sp,sp,12                                                        
                                             
        tail    f2                                                     tail    
f2  

2. REF fails to restore regs.
TC rv32i_zcmp.c
test_f0:                                                       test_f0:         
                                             
        cm.push {ra,s0},-32                                            cm.push 
{ra, s0}, -32                                 
        fsw     fs0,12(sp)                                             fsw     
fs0,12(sp)                                    
        call    my_getchar                                             call    
my_getchar                                    
        mv      s0,a0                                                  mv      
s0,a0                                         
        call    getf                                                   call    
getf                                          
        fmv.s   fs0,fa0                                                fmv.s   
fs0,fa0                                       
        call    my_getchar                                             call    
my_getchar                                    
        fcvt.s.w        fa5,s0                                         fcvt.s.w 
       fa5,s0                                
        fcvt.s.w        fa4,a0                                         fcvt.s.w 
       fa4,a0                                
        fadd.s  fa0,fa5,fs0                                            fadd.s  
fa0,fa5,fs0                                   
        flw     fs0,-20(sp) //issue in restoring fs0                  flw     
fs0,12(sp)                                    
        fadd.s  fa0,fa0,fa4                                            fadd.s  
fa0,fa0,fa4                                   
        fcvt.w.s a0,fa0,rtz                                            fcvt.w.s 
a0,fa0,rtz                                   
        cm.popret       {ra,s0},32                                     
cm.popret       {ra, s0}, 32   
                
3. REF accesses incorrect address of incoming para.
TC: zcmp_stack_alignment.c
fool_rv32e:                                                    fool_rv32e:      
                                             
        cm.push {ra,s0-s1},-32                                         cm.push 
{ra, s0-s1}, -32                              
                                                                       mv      
s0,a0                                         
        sw      a1,12(sp)                                              sw      
a1,12(sp)                                     
        mv      s0,a0                                                           
                                             
        sw      a2,8(sp)                                               sw      
a2,8(sp)                                      
        sw      a3,4(sp)                                               sw      
a3,4(sp)                                      
        sw      a4,0(sp)                                               sw      
a4,0(sp)                                      
        mv      s1,a5                                                  mv      
s1,a5                                         
        call    bar                                                    call    
bar                                           
        lw      a1,12(sp)                                              lw      
a1,12(sp)                                     
        lw      a2,8(sp)                                               lw      
a2,8(sp)                                      
        lw      a3,4(sp)                                               lw      
a3,4(sp)                                      
        lw      a4,0(sp)                                               lw      
a4,0(sp)                                      
        add     a0,s0,a1                                               add     
a0,s0,a1                                      
        add     a2,a0,a2                                               add     
a2,a0,a2                                      
        add     a3,a2,a3                                               add     
a3,a2,a3                                      
        lw      a0,28(sp) //issue in accessing incoming para           lw      
a0,32(sp)                                     
        add     a4,a3,a4                                               add     
a4,a3,a4                                      
        add     a4,a4,s1                                               add     
a4,a4,s1                                      
        add     a0,a4,a0                                               add     
a0,a4,a0                                      
        cm.popret       {ra,s0-s1},32                                  
cm.popret       {ra, s0-s1}, 32  

Fei Gao (2):
  [RISC-V] disable shrink-wrap-separate if zcmp enabled.
  [RISC-V] support cm.push cm.pop cm.popret in zcmp

 gcc/config/riscv/predicates.md                |   6 +
 gcc/config/riscv/riscv-protos.h               |   3 +
 gcc/config/riscv/riscv.cc                     | 403 ++++++++++++++++--
 gcc/config/riscv/riscv.h                      |  26 ++
 gcc/config/riscv/riscv.md                     |   7 +
 gcc/config/riscv/zc.md                        |  55 +++
 gcc/testsuite/gcc.target/riscv/rv32e_zcmp.c   | 239 +++++++++++
 gcc/testsuite/gcc.target/riscv/rv32i_zcmp.c   | 239 +++++++++++
 .../gcc.target/riscv/zcmp_stack_alignment.c   |  23 +
 9 files changed, 960 insertions(+), 41 deletions(-)
 create mode 100644 gcc/config/riscv/zc.md
 create mode 100644 gcc/testsuite/gcc.target/riscv/rv32e_zcmp.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rv32i_zcmp.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zcmp_stack_alignment.c

-- 
2.17.1

Reply via email to