BaseSynchronizationLib: Remove inline X86 assembly code

Yao, Jiewen Wed, 06 Mar 2019 21:07:38 -0800

HI Mike and Andrew
The problem is maintainability.

If we have multiple copy of implementation, a developer need verify multiple 
copy of implementation, if we make update. Before that, a developer has to be 
aware that there is multiple copy of implementation. - That increases the 
complexity.


If we have everything, there MAY be 5 copy - ASM, NASM, S, GCC inline, MS 
inline, theoretically.
Now, we remove ASM. It is good first step.
But we may still have 4 copies. I suggest we consider do more.

A recently case that SetJump/LongJump, I updated the NASM file for both IA32 
and X64.

Later, to my surprise, only X64 version NASM works, and IA32 version NASM does 
not work.
Then I notice that there is a different copy if IA32 Jump.c MS inline - I also 
need update. That is frustrated.

I think there should be a balance between optimization and code 
readability/maintainability.

Do we have data on what size benefit we can get with these inline function, 
from whole image level?
We can do evaluation at function level, case by case.
If we see a huge size benefit, I agree to keep this function.
If we see no size benefit, I suggest we do the cleanup for this function.


Thank you
Yao Jiewen



> -----Original Message-----
> From: edk2-devel [mailto:[email protected]] On Behalf Of
> Andrew Fish via edk2-devel
> Sent: Wednesday, March 6, 2019 5:31 PM
> To: Kinney, Michael D <[email protected]>
> Cc: edk2-devel-01 <[email protected]>; Gao, Liming
> <[email protected]>
> Subject: Re: [edk2] [PATCH 3/3] MdePkg/BaseSynchronizationLib: Remove
> inline X86 assembly code
> 
> 
> 
> > On Mar 6, 2019, at 4:41 PM, Kinney, Michael D
> <[email protected]> wrote:
> >
> > Liming,
> >
> > That does not work either because inline assembly uses compiler specific
> syntax.
> >
> > Please update with the specific list of functions that you think the inline
> should be removed to improve maintenance with no impacts in size/perf.
> >
> 
> Mike,
> 
> It is easy to find the gcc/clang inline assembler, just `git grep "__asm__
> __volatile__"`
> 
> The main files are:
> https://github.com/tianocore/edk2/blob/master/MdePkg/Library/BaseIoLib
> Intrinsic/IoLibGcc.c
> https://github.com/tianocore/edk2/blob/master/MdePkg/Library/BaseLib/X
> 64/GccInline.c
> https://github.com/tianocore/edk2/blob/master/MdePkg/Library/BaseLib/I
> a32/GccInline.c
> https://github.com/tianocore/edk2/blob/master/MdePkg/Library/BaseSync
> hronizationLib/Ia32/GccInline.c
> https://github.com/tianocore/edk2/blob/master/MdePkg/Library/BaseSync
> hronizationLib/X64/GccInline.c
> 
> This is really just compiler optimizer control.
> Library/BaseSynchronizationLib/SynchronizationGcc.c:21:#define
> _ReadWriteBarrier() do { __asm__ __volatile__ ("": : : "memory"); } while(0)
> 
> Looks like this is working around the alignment ASSERT in BaseIoLibIntrinsic.
> OvmfPkg/QemuVideoDxe/UnalignedIoGcc.c:43:  __asm__ __volatile__
> ( "outl %0, %1" : : "a" (Value), "d" ((UINT16)Port) );
> OvmfPkg/QemuVideoDxe/UnalignedIoGcc.c:67:  __asm__ __volatile__
> ( "inl %1, %0" : "=a" (Data) : "d" ((UINT16)Port) );
> 
> It seems like we have a reasonable set and should not use the inline
> assembly for any more complex code.
> 
> Thanks,
> 
> Andrew Fish
> 
> > The issue you pointed to was around SetJump()/LongJump().  Can we limit
> this BZ to only those 2 functions?
> >
> > Mike
> >   <>
> > From: Gao, Liming
> > Sent: Wednesday, March 6, 2019 4:28 PM
> > To: [email protected]
> > Cc: Zhang, Shenglei <[email protected]>; edk2-devel-01
> <[email protected]>; Kinney, Michael D
> <[email protected]>
> > Subject: RE: [edk2] [PATCH 3/3] MdePkg/BaseSynchronizationLib: Remove
> inline X86 assembly code
> >
> > Andrew:
> >   I want to keep only one implementation. If inline assembly c source is
> preferred, I suggest to remove its nasm implementation.
> >
> > Thanks
> > Liming
> >  <>From: [email protected] <mailto:[email protected]>
> [mailto:[email protected] <mailto:[email protected]>]
> > Sent: Tuesday, March 5, 2019 2:44 PM
> > To: Gao, Liming <[email protected] <mailto:[email protected]>>
> > Cc: Zhang, Shenglei <[email protected]
> <mailto:[email protected]>>; edk2-devel-01
> <[email protected] <mailto:[email protected]>>; Kinney,
> Michael D <[email protected]
> <mailto:[email protected]>>
> > Subject: Re: [edk2] [PATCH 3/3] MdePkg/BaseSynchronizationLib: Remove
> inline X86 assembly code
> >
> >
> >
> >
> > On Mar 5, 2019, at 1:32 PM, Gao, Liming <[email protected]
> <mailto:[email protected]>> wrote:
> >
> > Andrew:
> >  BZ 1163 is to remove inline X86 assembly code in C source file. But, this
> patch is wrong. I have gave my comments to update this patch.
> >
> >
> > Why do we want to remove inline X86 assembly. As I mentioned it will lead
> to larger binaries, slower binaries, and less optimized code.
> >
> > For example take this simple inline assembly function:
> >
> > VOID
> > EFIAPI
> > CpuBreakpoint (
> >   VOID
> >   )
> > {
> >   __asm__ __volatile__ ("int $3");
> > }
> >
> >
> > Today with clang LTO turned on calling CpuBreakpoint() looks like this from
> the callers point of view.
> >
> > a.out[0x1fa5] <+6>:  cc              int3
> >
> >
> > But if we move that to NASM:
> >
> >
> > a.out[0x1fa6] <+7>:  e8 07 00 00 00  calll
> 0x1fb2                    ; CpuBreakpoint
> >
> >
> > plus:
> > a.out`CpuBreakpoint:
> > a.out[0x1fb2] <+0>: cc     int3
> > a.out[0x1fb3] <+1>: c3     retl
> >
> >
> > And there is also an extra push and pop on the stack. The other issue is the
> call to the function that is unknown to the compiler will act like a
> _ReadWriteBarrier (Yikes I see _ReadWriteBarrier is deprecated in VC++
> 2017). Is the depreciation of some of these intrinsics in VC++ driving the
> removal of inline assembly? For GCC inline assembly works great for local
> compile, and for clang LTO it works in entire program optimization.
> >
> > The LTO bitcode includes inline assembly and constraints so that the
> optimizer knows what to do so it can get optimized just like the abstract
> bitcode during the Link Time Optimization.
> >
> > This is CpuBreakpoint():
> > ; Function Attrs: noinline nounwind optnone ssp uwtable
> > define void @CpuBreakpoint() #0 {
> >   call void asm sideeffect "int $$3", "~{dirflag},~{fpsr},~{flags}"()
> #2, !srcloc !3
> >   ret void
> > }
> >
> >
> > This is AsmReadMsr64():
> > ; Function Attrs: noinline nounwind optnone ssp uwtable
> > define i64 @AsmReadMsr64(i32) #0 {
> >   %2 = alloca i32, align 4
> >   %3 = alloca i64, align 8
> >   store i32 %0, i32* %2, align 4
> >   %4 = load i32, i32* %2, align 4
> >   %5 = call i64 asm sideeffect "rdmsr",
> "=A,{cx},~{dirflag},~{fpsr},~{flags}"(i32 %4) #2, !srcloc !4
> >   store i64 %5, i64* %3, align 8
> >   %6 = load i64, i64* %3, align 8
> >   ret i64 %6
> > }
> >
> >
> >
> > I agree it make sense to remove .S and .asm files and only have .nasm files.
> >
> > Thanks,
> >
> > Andrew Fish
> >
> > PS For the Xcode clang since it emits frame pointers by default we need to
> add the extra 4 bytes to save the assembly functions so the stack can get
> unwound.
> >
> > a.out`CpuBreakpoint:
> > a.out[0x1f99] <+0>: 55     pushl  %ebp
> > a.out[0x1f9a] <+1>: 89 e5  movl   %esp, %ebp
> > a.out[0x1f9c] <+3>: cc     int3
> > a.out[0x1f9d] <+4>: 5d     popl   %ebp
> > a.out[0x1f9e] <+5>: c3     retl
> >
> >
> > So breakpoint goes from 1 byte to 11 bytes if we get rid of the intrinsics.
> >
> >
> >  The change is to reduce the duplicated implementation. It will be good
> on the code maintain. Recently, one patch has to update .c and .nasm both.
> Please see
> https://lists.01.org/pipermail/edk2-devel/2019-February/037165.html
> <https://lists.01.org/pipermail/edk2-devel/2019-February/037165.html>.
> Beside this change, I propose to remove all GAS assembly code for IA32 and
> X64 arch. After those change, the patch owner will collect the impact of the
> image size.
> >
> > Thanks
> > Liming
> 
> _______________________________________________
> edk2-devel mailing list
> [email protected]
> https://lists.01.org/mailman/listinfo/edk2-devel
_______________________________________________
edk2-devel mailing list
[email protected]
https://lists.01.org/mailman/listinfo/edk2-devel

Re: [edk2] [PATCH 3/3] MdePkg/BaseSynchronizationLib: Remove inline X86 assembly code

Reply via email to