> On Mar 5, 2019, at 1:32 PM, Gao, Liming <liming....@intel.com> wrote:
> 
> Andrew:
>  BZ 1163 is to remove inline X86 assembly code in C source file. But, this 
> patch is wrong. I have gave my comments to update this patch.
> 

Why do we want to remove inline X86 assembly. As I mentioned it will lead to 
larger binaries, slower binaries, and less optimized code.

For example take this simple inline assembly function:

VOID
EFIAPI
CpuBreakpoint (
  VOID
  )
{
  __asm__ __volatile__ ("int $3");
}

Today with clang LTO turned on calling CpuBreakpoint() looks like this from the 
callers point of view. 

a.out[0x1fa5] <+6>:  cc              int3   

But if we move that to NASM:

a.out[0x1fa6] <+7>:  e8 07 00 00 00  calll  0x1fb2                    ; 
CpuBreakpoint

plus:
a.out`CpuBreakpoint:
a.out[0x1fb2] <+0>: cc     int3   
a.out[0x1fb3] <+1>: c3     retl   

And there is also an extra push and pop on the stack. The other issue is the 
call to the function that is unknown to the compiler will act like a 
_ReadWriteBarrier (Yikes I see _ReadWriteBarrier is deprecated in VC++ 2017). 
Is the depreciation of some of these intrinsics in VC++ driving the removal of 
inline assembly? For GCC inline assembly works great for local compile, and for 
clang LTO it works in entire program optimization.

The LTO bitcode includes inline assembly and constraints so that the optimizer 
knows what to do so it can get optimized just like the abstract bitcode during 
the Link Time Optimization. 

This is CpuBreakpoint():
; Function Attrs: noinline nounwind optnone ssp uwtable
define void @CpuBreakpoint() #0 {
  call void asm sideeffect "int $$3", "~{dirflag},~{fpsr},~{flags}"() #2, 
!srcloc !3
  ret void
}

This is AsmReadMsr64():
; Function Attrs: noinline nounwind optnone ssp uwtable
define i64 @AsmReadMsr64(i32) #0 {
  %2 = alloca i32, align 4
  %3 = alloca i64, align 8
  store i32 %0, i32* %2, align 4
  %4 = load i32, i32* %2, align 4
  %5 = call i64 asm sideeffect "rdmsr", 
"=A,{cx},~{dirflag},~{fpsr},~{flags}"(i32 %4) #2, !srcloc !4
  store i64 %5, i64* %3, align 8
  %6 = load i64, i64* %3, align 8
  ret i64 %6
}


I agree it make sense to remove .S and .asm files and only have .nasm files. 

Thanks,

Andrew Fish

PS For the Xcode clang since it emits frame pointers by default we need to add 
the extra 4 bytes to save the assembly functions so the stack can get unwound. 

a.out`CpuBreakpoint:
a.out[0x1f99] <+0>: 55     pushl  %ebp
a.out[0x1f9a] <+1>: 89 e5  movl   %esp, %ebp
a.out[0x1f9c] <+3>: cc     int3   
a.out[0x1f9d] <+4>: 5d     popl   %ebp
a.out[0x1f9e] <+5>: c3     retl   

So breakpoint goes from 1 byte to 11 bytes if we get rid of the intrinsics. 

>  The change is to reduce the duplicated implementation. It will be good on 
> the code maintain. Recently, one patch has to update .c and .nasm both. 
> Please see 
> https://lists.01.org/pipermail/edk2-devel/2019-February/037165.html 
> <https://lists.01.org/pipermail/edk2-devel/2019-February/037165.html>. Beside 
> this change, I propose to remove all GAS assembly code for IA32 and X64 arch. 
> After those change, the patch owner will collect the impact of the image 
> size. 
> 
> Thanks
> Liming

_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel

Reply via email to