ameisen wrote:

> > This PR doesn't fix (if it was intended to) sibling call optimizations.
> > It _does_ change the code generation for them, however - but it maintains 
> > all of the stack adjustments as well as the `fp`/`sp` moves.
> > ```c++
> > __attribute__((noinline))
> > int foo0()
> > {
> >     volatile int a;
> >     return a += 1;
> > }
> > 
> > int bar0()
> > {
> >     return foo0();
> > }
> > ```
> > 
> > 
> >     
> >       
> >     
> > 
> >       
> >     
> > 
> >     
> >   
> > with tail calls disabled (`-std=gnu++23 -O3 -march=mips32r6 -fno-PIC -mllvm 
> > -mips-tail-calls=0`):
> > ```assembly
> >     addiu   $sp, $sp, -24
> >     sw      $ra, 20($sp)                    # 4-byte Folded Spill
> >     sw      $fp, 16($sp)                    # 4-byte Folded Spill
> >     move    $fp, $sp
> >     jal     _Z4foo0v
> >     nop
> >     move    $sp, $fp
> >     lw      $fp, 16($sp)                    # 4-byte Folded Reload
> >     lw      $ra, 20($sp)                    # 4-byte Folded Reload
> >     addiu   $sp, $sp, 24
> >     jrc     $ra
> > ```
> > 
> > 
> >     
> >       
> >     
> > 
> >       
> >     
> > 
> >     
> >   
> > with tail calls enabled (`-std=gnu++23 -O3 -march=mips32r6 -fno-PIC -mllvm 
> > -mips-tail-calls=1`):
> > ```assembly
> >     addiu   $sp, $sp, -8
> >     sw      $ra, 4($sp)                     # 4-byte Folded Spill
> >     sw      $fp, 0($sp)                     # 4-byte Folded Spill
> >     move    $fp, $sp
> >     move    $sp, $fp
> >     lw      $fp, 0($sp)                     # 4-byte Folded Reload
> >     lw      $ra, 4($sp)                     # 4-byte Folded Reload
> >     j       _Z4foo0v
> >     addiu   $sp, $sp, 8
> > ```
> > 
> > 
> >     
> >       
> >     
> > 
> >       
> >     
> > 
> >     
> >   
> > What is expected:
> > ```assembly
> >     j       _Z4foo0v
> >     nop
> > ```
> > 
> > 
> >     
> >       
> >     
> > 
> >       
> >     
> > 
> >     
> >   
> > or
> > ```assembly
> >     bc      _Z4foo0v
> > ```
> > 
> > 
> >     
> >       
> >     
> > 
> >       
> >     
> > 
> >     
> >   
> > So, it's moving the `j` to the end, but it's not cleaning up the stack 
> > changes that were _around_ the jump. Everything in here but the `j` could 
> > be removed (with a `nop`, but the `j` should be replaced with a `bc` 
> > anyways), but this effectively is generating a rather messed up function.
> > A significant amount of logic for handling this issue is present in 
> > `LowerCall` for PPC and for AArch64 (and the rest), but is missing in the 
> > MIPS version.
> 
> Hey @ameisen, thanks a lot. Yes, I agree, it is missing for MIPS case in the 
> ISel Lowering, but I think this should be a separate github issue, and 
> separate PR should be created for this. Can you please create an issue with 
> the content from this comment? Thanks in advance.

I'm trying to figure out how to word the issue, as without your PR clang 
doesn't even turn it into a tail call at all. Since sibling optimizations are 
based upon tail call optimizations (they're a subset of them) it complicates an 
issue as I'm not sure how to write an issue dependent upon a PR.

https://github.com/llvm/llvm-project/pull/161860
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to