HI Mike and Andrew The problem is maintainability. If we have multiple copy of implementation, a developer need verify multiple copy of implementation, if we make update. Before that, a developer has to be aware that there is multiple copy of implementation. - That increases the complexity.
If we have everything, there MAY be 5 copy - ASM, NASM, S, GCC inline, MS inline, theoretically. Now, we remove ASM. It is good first step. But we may still have 4 copies. I suggest we consider do more. A recently case that SetJump/LongJump, I updated the NASM file for both IA32 and X64. Later, to my surprise, only X64 version NASM works, and IA32 version NASM does not work. Then I notice that there is a different copy if IA32 Jump.c MS inline - I also need update. That is frustrated. I think there should be a balance between optimization and code readability/maintainability. Do we have data on what size benefit we can get with these inline function, from whole image level? We can do evaluation at function level, case by case. If we see a huge size benefit, I agree to keep this function. If we see no size benefit, I suggest we do the cleanup for this function. Thank you Yao Jiewen > -----Original Message----- > From: edk2-devel [mailto:edk2-devel-boun...@lists.01.org] On Behalf Of > Andrew Fish via edk2-devel > Sent: Wednesday, March 6, 2019 5:31 PM > To: Kinney, Michael D <michael.d.kin...@intel.com> > Cc: edk2-devel-01 <edk2-devel@lists.01.org>; Gao, Liming > <liming....@intel.com> > Subject: Re: [edk2] [PATCH 3/3] MdePkg/BaseSynchronizationLib: Remove > inline X86 assembly code > > > > > On Mar 6, 2019, at 4:41 PM, Kinney, Michael D > <michael.d.kin...@intel.com> wrote: > > > > Liming, > > > > That does not work either because inline assembly uses compiler specific > syntax. > > > > Please update with the specific list of functions that you think the inline > should be removed to improve maintenance with no impacts in size/perf. > > > > Mike, > > It is easy to find the gcc/clang inline assembler, just `git grep "__asm__ > __volatile__"` > > The main files are: > https://github.com/tianocore/edk2/blob/master/MdePkg/Library/BaseIoLib > Intrinsic/IoLibGcc.c > https://github.com/tianocore/edk2/blob/master/MdePkg/Library/BaseLib/X > 64/GccInline.c > https://github.com/tianocore/edk2/blob/master/MdePkg/Library/BaseLib/I > a32/GccInline.c > https://github.com/tianocore/edk2/blob/master/MdePkg/Library/BaseSync > hronizationLib/Ia32/GccInline.c > https://github.com/tianocore/edk2/blob/master/MdePkg/Library/BaseSync > hronizationLib/X64/GccInline.c > > This is really just compiler optimizer control. > Library/BaseSynchronizationLib/SynchronizationGcc.c:21:#define > _ReadWriteBarrier() do { __asm__ __volatile__ ("": : : "memory"); } while(0) > > Looks like this is working around the alignment ASSERT in BaseIoLibIntrinsic. > OvmfPkg/QemuVideoDxe/UnalignedIoGcc.c:43: __asm__ __volatile__ > ( "outl %0, %1" : : "a" (Value), "d" ((UINT16)Port) ); > OvmfPkg/QemuVideoDxe/UnalignedIoGcc.c:67: __asm__ __volatile__ > ( "inl %1, %0" : "=a" (Data) : "d" ((UINT16)Port) ); > > It seems like we have a reasonable set and should not use the inline > assembly for any more complex code. > > Thanks, > > Andrew Fish > > > The issue you pointed to was around SetJump()/LongJump(). Can we limit > this BZ to only those 2 functions? > > > > Mike > > <> > > From: Gao, Liming > > Sent: Wednesday, March 6, 2019 4:28 PM > > To: af...@apple.com > > Cc: Zhang, Shenglei <shenglei.zh...@intel.com>; edk2-devel-01 > <edk2-devel@lists.01.org>; Kinney, Michael D > <michael.d.kin...@intel.com> > > Subject: RE: [edk2] [PATCH 3/3] MdePkg/BaseSynchronizationLib: Remove > inline X86 assembly code > > > > Andrew: > > I want to keep only one implementation. If inline assembly c source is > preferred, I suggest to remove its nasm implementation. > > > > Thanks > > Liming > > <>From: af...@apple.com <mailto:af...@apple.com> > [mailto:af...@apple.com <mailto:af...@apple.com>] > > Sent: Tuesday, March 5, 2019 2:44 PM > > To: Gao, Liming <liming....@intel.com <mailto:liming....@intel.com>> > > Cc: Zhang, Shenglei <shenglei.zh...@intel.com > <mailto:shenglei.zh...@intel.com>>; edk2-devel-01 > <edk2-devel@lists.01.org <mailto:edk2-devel@lists.01.org>>; Kinney, > Michael D <michael.d.kin...@intel.com > <mailto:michael.d.kin...@intel.com>> > > Subject: Re: [edk2] [PATCH 3/3] MdePkg/BaseSynchronizationLib: Remove > inline X86 assembly code > > > > > > > > > > On Mar 5, 2019, at 1:32 PM, Gao, Liming <liming....@intel.com > <mailto:liming....@intel.com>> wrote: > > > > Andrew: > > BZ 1163 is to remove inline X86 assembly code in C source file. But, this > patch is wrong. I have gave my comments to update this patch. > > > > > > Why do we want to remove inline X86 assembly. As I mentioned it will lead > to larger binaries, slower binaries, and less optimized code. > > > > For example take this simple inline assembly function: > > > > VOID > > EFIAPI > > CpuBreakpoint ( > > VOID > > ) > > { > > __asm__ __volatile__ ("int $3"); > > } > > > > > > Today with clang LTO turned on calling CpuBreakpoint() looks like this from > the callers point of view. > > > > a.out[0x1fa5] <+6>: cc int3 > > > > > > But if we move that to NASM: > > > > > > a.out[0x1fa6] <+7>: e8 07 00 00 00 calll > 0x1fb2 ; CpuBreakpoint > > > > > > plus: > > a.out`CpuBreakpoint: > > a.out[0x1fb2] <+0>: cc int3 > > a.out[0x1fb3] <+1>: c3 retl > > > > > > And there is also an extra push and pop on the stack. The other issue is the > call to the function that is unknown to the compiler will act like a > _ReadWriteBarrier (Yikes I see _ReadWriteBarrier is deprecated in VC++ > 2017). Is the depreciation of some of these intrinsics in VC++ driving the > removal of inline assembly? For GCC inline assembly works great for local > compile, and for clang LTO it works in entire program optimization. > > > > The LTO bitcode includes inline assembly and constraints so that the > optimizer knows what to do so it can get optimized just like the abstract > bitcode during the Link Time Optimization. > > > > This is CpuBreakpoint(): > > ; Function Attrs: noinline nounwind optnone ssp uwtable > > define void @CpuBreakpoint() #0 { > > call void asm sideeffect "int $$3", "~{dirflag},~{fpsr},~{flags}"() > #2, !srcloc !3 > > ret void > > } > > > > > > This is AsmReadMsr64(): > > ; Function Attrs: noinline nounwind optnone ssp uwtable > > define i64 @AsmReadMsr64(i32) #0 { > > %2 = alloca i32, align 4 > > %3 = alloca i64, align 8 > > store i32 %0, i32* %2, align 4 > > %4 = load i32, i32* %2, align 4 > > %5 = call i64 asm sideeffect "rdmsr", > "=A,{cx},~{dirflag},~{fpsr},~{flags}"(i32 %4) #2, !srcloc !4 > > store i64 %5, i64* %3, align 8 > > %6 = load i64, i64* %3, align 8 > > ret i64 %6 > > } > > > > > > > > I agree it make sense to remove .S and .asm files and only have .nasm files. > > > > Thanks, > > > > Andrew Fish > > > > PS For the Xcode clang since it emits frame pointers by default we need to > add the extra 4 bytes to save the assembly functions so the stack can get > unwound. > > > > a.out`CpuBreakpoint: > > a.out[0x1f99] <+0>: 55 pushl %ebp > > a.out[0x1f9a] <+1>: 89 e5 movl %esp, %ebp > > a.out[0x1f9c] <+3>: cc int3 > > a.out[0x1f9d] <+4>: 5d popl %ebp > > a.out[0x1f9e] <+5>: c3 retl > > > > > > So breakpoint goes from 1 byte to 11 bytes if we get rid of the intrinsics. > > > > > > The change is to reduce the duplicated implementation. It will be good > on the code maintain. Recently, one patch has to update .c and .nasm both. > Please see > https://lists.01.org/pipermail/edk2-devel/2019-February/037165.html > <https://lists.01.org/pipermail/edk2-devel/2019-February/037165.html>. > Beside this change, I propose to remove all GAS assembly code for IA32 and > X64 arch. After those change, the patch owner will collect the impact of the > image size. > > > > Thanks > > Liming > > _______________________________________________ > edk2-devel mailing list > edk2-devel@lists.01.org > https://lists.01.org/mailman/listinfo/edk2-devel _______________________________________________ edk2-devel mailing list edk2-devel@lists.01.org https://lists.01.org/mailman/listinfo/edk2-devel