jrtc27 added a comment.

In D79916#2279863 <https://reviews.llvm.org/D79916#2279863>, @Bdragon28 wrote:

> In D79916#2279816 <https://reviews.llvm.org/D79916#2279816>, @jrtc27 wrote:
>
>> In D79916#2279812 <https://reviews.llvm.org/D79916#2279812>, @Bdragon28 
>> wrote:
>>
>>> In D79916#2279045 <https://reviews.llvm.org/D79916#2279045>, @jrtc27 wrote:
>>>
>>>> This has significantly regressed FreeBSD's performance with the new 
>>>> version of Clang. It seems Clang does not inline functions at -O1, unlike 
>>>> GCC, and since FreeBSD currently compiles its kernel with -O whenever 
>>>> debug symbols are enabled[1] (which, of course, is almost always true), 
>>>> this results in all its `static inline` helper functions not being inlined 
>>>> at all, a pattern that is common in the kernel, used for things like 
>>>> `get_curthread` and the atomics implementations.
>>>>
>>>> [1] This is a dubious decision made in r140400 in 2005 to provide "truer 
>>>> debugger stack traces" (well, before then there was ping-ponging between 
>>>> -O and -O2 based on concerns around correctness vs performance, but amd64 
>>>> is an exception that has always used -O2 since r127180 it seems). Given 
>>>> that GCC will inline at -O, at least these days, the motivation seems to 
>>>> no longer exist, and compiling a kernel at anything other than -O2 (or 
>>>> maybe -O3) seems like a silly thing to do, but nevertheless it's what is 
>>>> currently done.
>>>>
>>>> Cc: @dim @trasz
>>>
>>> This is actually SUCH a bad idea that a kernel built with -O will *not work 
>>> at all* on 32 bit powerpc platforms (presumably due to allocating stack 
>>> frames in the middle of assembly fragments in the memory management that 
>>> are supposed to be inlined at all times.) I had to hack kern.pre.mk to 
>>> rquest -O2 at all times just to get a functioning kernel.
>>
>> Well, -O0, -O1, -O2 and -O should all produce working kernels, and any cases 
>> where they don't are compiler bugs (or kernel bugs if they rely on UB) that 
>> should be fixed, not worked around by tweaking the compiler flags in a 
>> fragile way until you get the behaviour relied on. Correctness and 
>> performance are very different issues here.
>
> As an example:
>
>   static __inline void
>   mtsrin(vm_offset_t va, register_t value)
>   {
>   
>           __asm __volatile ("mtsrin %0,%1; isync" :: "r"(value), "r"(va));
>   }
>
> This code is used in the mmu when bootstrapping the cpu like so:
>
>   for (i = 0; i < 16; i++)
>           mtsrin(i << ADDR_SR_SHFT, kernel_pmap->pm_sr[i]);
>   powerpc_sync();
>   
>   sdr = (u_int)moea_pteg_table | (moea_pteg_mask >> 10);
>   __asm __volatile("mtsdr1 %0" :: "r"(sdr));
>   isync();
>   
>   tlbia();
>
> During the loop there, we are in the middle of programming the MMU segment 
> registers in real mode, and is supposed to be doing all work out of 
> registers. (and powerpc_sync() and isync() should be expanded to their single 
> assembly instruction, not a function call. The whole point of calling those 
> is that we are in an inconsistent hardware state and need to sync up before 
> continuing execution)
>
> If there isn't a way to force inlining, we will have to change to using 
> preprocessor macros in cpufunc.h.

There is, it's called `__attribute__((always_inline))` and supported by both 
GCC and Clang. But at -O0 you'll still have register allocation to deal with, 
so really that code is just fundamentally broken and should not be written in 
C. There is no way for you to guarantee stack spills are not used, it's way out 
of scope for C.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79916/new/

https://reviews.llvm.org/D79916

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to