BaseSynchronizationLib: Remove inline X86 assembly code

Andrew Fish via edk2-devel Wed, 06 Mar 2019 17:31:44 -0800


> On Mar 6, 2019, at 4:41 PM, Kinney, Michael D <michael.d.kin...@intel.com> 
> wrote:
> 
> Liming,
>  
> That does not work either because inline assembly uses compiler specific 
> syntax.
>  
> Please update with the specific list of functions that you think the inline 
> should be removed to improve maintenance with no impacts in size/perf.
>


Mike,

It is easy to find the gcc/clang inline assembler, just `git grep "__asm__ 
__volatile__"`

The main files are:
https://github.com/tianocore/edk2/blob/master/MdePkg/Library/BaseIoLibIntrinsic/IoLibGcc.c
https://github.com/tianocore/edk2/blob/master/MdePkg/Library/BaseLib/X64/GccInline.c
https://github.com/tianocore/edk2/blob/master/MdePkg/Library/BaseLib/Ia32/GccInline.c
https://github.com/tianocore/edk2/blob/master/MdePkg/Library/BaseSynchronizationLib/Ia32/GccInline.c
https://github.com/tianocore/edk2/blob/master/MdePkg/Library/BaseSynchronizationLib/X64/GccInline.c

This is really just compiler optimizer control. 
Library/BaseSynchronizationLib/SynchronizationGcc.c:21:#define 
_ReadWriteBarrier() do { __asm__ __volatile__ ("": : : "memory"); } while(0)

Looks like this is working around the alignment ASSERT in BaseIoLibIntrinsic. 
OvmfPkg/QemuVideoDxe/UnalignedIoGcc.c:43:  __asm__ __volatile__ ( "outl %0, %1" 
: : "a" (Value), "d" ((UINT16)Port) );
OvmfPkg/QemuVideoDxe/UnalignedIoGcc.c:67:  __asm__ __volatile__ ( "inl %1, %0" 
: "=a" (Data) : "d" ((UINT16)Port) );

It seems like we have a reasonable set and should not use the inline assembly 
for any more complex code. 

Thanks,

Andrew Fish

> The issue you pointed to was around SetJump()/LongJump().  Can we limit this 
> BZ to only those 2 functions?
>  
> Mike
>   <>
> From: Gao, Liming 
> Sent: Wednesday, March 6, 2019 4:28 PM
> To: af...@apple.com
> Cc: Zhang, Shenglei <shenglei.zh...@intel.com>; edk2-devel-01 
> <edk2-devel@lists.01.org>; Kinney, Michael D <michael.d.kin...@intel.com>
> Subject: RE: [edk2] [PATCH 3/3] MdePkg/BaseSynchronizationLib: Remove inline 
> X86 assembly code
>  
> Andrew:
>   I want to keep only one implementation. If inline assembly c source is 
> preferred, I suggest to remove its nasm implementation.
>  
> Thanks
> Liming
>  <>From: af...@apple.com <mailto:af...@apple.com> [mailto:af...@apple.com 
> <mailto:af...@apple.com>] 
> Sent: Tuesday, March 5, 2019 2:44 PM
> To: Gao, Liming <liming....@intel.com <mailto:liming....@intel.com>>
> Cc: Zhang, Shenglei <shenglei.zh...@intel.com 
> <mailto:shenglei.zh...@intel.com>>; edk2-devel-01 <edk2-devel@lists.01.org 
> <mailto:edk2-devel@lists.01.org>>; Kinney, Michael D 
> <michael.d.kin...@intel.com <mailto:michael.d.kin...@intel.com>>
> Subject: Re: [edk2] [PATCH 3/3] MdePkg/BaseSynchronizationLib: Remove inline 
> X86 assembly code
>  
>  
>  
> 
> On Mar 5, 2019, at 1:32 PM, Gao, Liming <liming....@intel.com 
> <mailto:liming....@intel.com>> wrote:
>  
> Andrew:
>  BZ 1163 is to remove inline X86 assembly code in C source file. But, this 
> patch is wrong. I have gave my comments to update this patch.
> 
>  
> Why do we want to remove inline X86 assembly. As I mentioned it will lead to 
> larger binaries, slower binaries, and less optimized code.
>  
> For example take this simple inline assembly function:
>  
> VOID
> EFIAPI
> CpuBreakpoint (
>   VOID
>   )
> {
>   __asm__ __volatile__ ("int $3");
> }
>  
> 
> Today with clang LTO turned on calling CpuBreakpoint() looks like this from 
> the callers point of view. 
>  
> a.out[0x1fa5] <+6>:  cc              int3   
>  
> 
> But if we move that to NASM:
>  
> 
> a.out[0x1fa6] <+7>:  e8 07 00 00 00  calll  0x1fb2                    ; 
> CpuBreakpoint
>  
> 
> plus:
> a.out`CpuBreakpoint:
> a.out[0x1fb2] <+0>: cc     int3   
> a.out[0x1fb3] <+1>: c3     retl   
>  
> 
> And there is also an extra push and pop on the stack. The other issue is the 
> call to the function that is unknown to the compiler will act like a 
> _ReadWriteBarrier (Yikes I see _ReadWriteBarrier is deprecated in VC++ 2017). 
> Is the depreciation of some of these intrinsics in VC++ driving the removal 
> of inline assembly? For GCC inline assembly works great for local compile, 
> and for clang LTO it works in entire program optimization.
>  
> The LTO bitcode includes inline assembly and constraints so that the 
> optimizer knows what to do so it can get optimized just like the abstract 
> bitcode during the Link Time Optimization. 
>  
> This is CpuBreakpoint():
> ; Function Attrs: noinline nounwind optnone ssp uwtable
> define void @CpuBreakpoint() #0 {
>   call void asm sideeffect "int $$3", "~{dirflag},~{fpsr},~{flags}"() #2, 
> !srcloc !3
>   ret void
> }
>  
> 
> This is AsmReadMsr64():
> ; Function Attrs: noinline nounwind optnone ssp uwtable
> define i64 @AsmReadMsr64(i32) #0 {
>   %2 = alloca i32, align 4
>   %3 = alloca i64, align 8
>   store i32 %0, i32* %2, align 4
>   %4 = load i32, i32* %2, align 4
>   %5 = call i64 asm sideeffect "rdmsr", 
> "=A,{cx},~{dirflag},~{fpsr},~{flags}"(i32 %4) #2, !srcloc !4
>   store i64 %5, i64* %3, align 8
>   %6 = load i64, i64* %3, align 8
>   ret i64 %6
> }
>  
> 
>  
> I agree it make sense to remove .S and .asm files and only have .nasm files. 
>  
> Thanks,
>  
> Andrew Fish
>  
> PS For the Xcode clang since it emits frame pointers by default we need to 
> add the extra 4 bytes to save the assembly functions so the stack can get 
> unwound. 
>  
> a.out`CpuBreakpoint:
> a.out[0x1f99] <+0>: 55     pushl  %ebp
> a.out[0x1f9a] <+1>: 89 e5  movl   %esp, %ebp
> a.out[0x1f9c] <+3>: cc     int3   
> a.out[0x1f9d] <+4>: 5d     popl   %ebp
> a.out[0x1f9e] <+5>: c3     retl   
>  
> 
> So breakpoint goes from 1 byte to 11 bytes if we get rid of the intrinsics. 
>  
> 
>  The change is to reduce the duplicated implementation. It will be good on 
> the code maintain. Recently, one patch has to update .c and .nasm both. 
> Please see 
> https://lists.01.org/pipermail/edk2-devel/2019-February/037165.html 
> <https://lists.01.org/pipermail/edk2-devel/2019-February/037165.html>. Beside 
> this change, I propose to remove all GAS assembly code for IA32 and X64 arch. 
> After those change, the patch owner will collect the impact of the image 
> size. 
> 
> Thanks
> Liming

_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel

Re: [edk2] [PATCH 3/3] MdePkg/BaseSynchronizationLib: Remove inline X86 assembly code

Reply via email to