> On Mar 6, 2019, at 4:41 PM, Kinney, Michael D <michael.d.kin...@intel.com> > wrote: > > Liming, > > That does not work either because inline assembly uses compiler specific > syntax. > > Please update with the specific list of functions that you think the inline > should be removed to improve maintenance with no impacts in size/perf. >
Mike, It is easy to find the gcc/clang inline assembler, just `git grep "__asm__ __volatile__"` The main files are: https://github.com/tianocore/edk2/blob/master/MdePkg/Library/BaseIoLibIntrinsic/IoLibGcc.c https://github.com/tianocore/edk2/blob/master/MdePkg/Library/BaseLib/X64/GccInline.c https://github.com/tianocore/edk2/blob/master/MdePkg/Library/BaseLib/Ia32/GccInline.c https://github.com/tianocore/edk2/blob/master/MdePkg/Library/BaseSynchronizationLib/Ia32/GccInline.c https://github.com/tianocore/edk2/blob/master/MdePkg/Library/BaseSynchronizationLib/X64/GccInline.c This is really just compiler optimizer control. Library/BaseSynchronizationLib/SynchronizationGcc.c:21:#define _ReadWriteBarrier() do { __asm__ __volatile__ ("": : : "memory"); } while(0) Looks like this is working around the alignment ASSERT in BaseIoLibIntrinsic. OvmfPkg/QemuVideoDxe/UnalignedIoGcc.c:43: __asm__ __volatile__ ( "outl %0, %1" : : "a" (Value), "d" ((UINT16)Port) ); OvmfPkg/QemuVideoDxe/UnalignedIoGcc.c:67: __asm__ __volatile__ ( "inl %1, %0" : "=a" (Data) : "d" ((UINT16)Port) ); It seems like we have a reasonable set and should not use the inline assembly for any more complex code. Thanks, Andrew Fish > The issue you pointed to was around SetJump()/LongJump(). Can we limit this > BZ to only those 2 functions? > > Mike > <> > From: Gao, Liming > Sent: Wednesday, March 6, 2019 4:28 PM > To: af...@apple.com > Cc: Zhang, Shenglei <shenglei.zh...@intel.com>; edk2-devel-01 > <edk2-devel@lists.01.org>; Kinney, Michael D <michael.d.kin...@intel.com> > Subject: RE: [edk2] [PATCH 3/3] MdePkg/BaseSynchronizationLib: Remove inline > X86 assembly code > > Andrew: > I want to keep only one implementation. If inline assembly c source is > preferred, I suggest to remove its nasm implementation. > > Thanks > Liming > <>From: af...@apple.com <mailto:af...@apple.com> [mailto:af...@apple.com > <mailto:af...@apple.com>] > Sent: Tuesday, March 5, 2019 2:44 PM > To: Gao, Liming <liming....@intel.com <mailto:liming....@intel.com>> > Cc: Zhang, Shenglei <shenglei.zh...@intel.com > <mailto:shenglei.zh...@intel.com>>; edk2-devel-01 <edk2-devel@lists.01.org > <mailto:edk2-devel@lists.01.org>>; Kinney, Michael D > <michael.d.kin...@intel.com <mailto:michael.d.kin...@intel.com>> > Subject: Re: [edk2] [PATCH 3/3] MdePkg/BaseSynchronizationLib: Remove inline > X86 assembly code > > > > > On Mar 5, 2019, at 1:32 PM, Gao, Liming <liming....@intel.com > <mailto:liming....@intel.com>> wrote: > > Andrew: > BZ 1163 is to remove inline X86 assembly code in C source file. But, this > patch is wrong. I have gave my comments to update this patch. > > > Why do we want to remove inline X86 assembly. As I mentioned it will lead to > larger binaries, slower binaries, and less optimized code. > > For example take this simple inline assembly function: > > VOID > EFIAPI > CpuBreakpoint ( > VOID > ) > { > __asm__ __volatile__ ("int $3"); > } > > > Today with clang LTO turned on calling CpuBreakpoint() looks like this from > the callers point of view. > > a.out[0x1fa5] <+6>: cc int3 > > > But if we move that to NASM: > > > a.out[0x1fa6] <+7>: e8 07 00 00 00 calll 0x1fb2 ; > CpuBreakpoint > > > plus: > a.out`CpuBreakpoint: > a.out[0x1fb2] <+0>: cc int3 > a.out[0x1fb3] <+1>: c3 retl > > > And there is also an extra push and pop on the stack. The other issue is the > call to the function that is unknown to the compiler will act like a > _ReadWriteBarrier (Yikes I see _ReadWriteBarrier is deprecated in VC++ 2017). > Is the depreciation of some of these intrinsics in VC++ driving the removal > of inline assembly? For GCC inline assembly works great for local compile, > and for clang LTO it works in entire program optimization. > > The LTO bitcode includes inline assembly and constraints so that the > optimizer knows what to do so it can get optimized just like the abstract > bitcode during the Link Time Optimization. > > This is CpuBreakpoint(): > ; Function Attrs: noinline nounwind optnone ssp uwtable > define void @CpuBreakpoint() #0 { > call void asm sideeffect "int $$3", "~{dirflag},~{fpsr},~{flags}"() #2, > !srcloc !3 > ret void > } > > > This is AsmReadMsr64(): > ; Function Attrs: noinline nounwind optnone ssp uwtable > define i64 @AsmReadMsr64(i32) #0 { > %2 = alloca i32, align 4 > %3 = alloca i64, align 8 > store i32 %0, i32* %2, align 4 > %4 = load i32, i32* %2, align 4 > %5 = call i64 asm sideeffect "rdmsr", > "=A,{cx},~{dirflag},~{fpsr},~{flags}"(i32 %4) #2, !srcloc !4 > store i64 %5, i64* %3, align 8 > %6 = load i64, i64* %3, align 8 > ret i64 %6 > } > > > > I agree it make sense to remove .S and .asm files and only have .nasm files. > > Thanks, > > Andrew Fish > > PS For the Xcode clang since it emits frame pointers by default we need to > add the extra 4 bytes to save the assembly functions so the stack can get > unwound. > > a.out`CpuBreakpoint: > a.out[0x1f99] <+0>: 55 pushl %ebp > a.out[0x1f9a] <+1>: 89 e5 movl %esp, %ebp > a.out[0x1f9c] <+3>: cc int3 > a.out[0x1f9d] <+4>: 5d popl %ebp > a.out[0x1f9e] <+5>: c3 retl > > > So breakpoint goes from 1 byte to 11 bytes if we get rid of the intrinsics. > > > The change is to reduce the duplicated implementation. It will be good on > the code maintain. Recently, one patch has to update .c and .nasm both. > Please see > https://lists.01.org/pipermail/edk2-devel/2019-February/037165.html > <https://lists.01.org/pipermail/edk2-devel/2019-February/037165.html>. Beside > this change, I propose to remove all GAS assembly code for IA32 and X64 arch. > After those change, the patch owner will collect the impact of the image > size. > > Thanks > Liming _______________________________________________ edk2-devel mailing list edk2-devel@lists.01.org https://lists.01.org/mailman/listinfo/edk2-devel