On 09/25/18 18:18, Laszlo Ersek wrote: > On 09/25/18 16:22, Laszlo Ersek wrote: >> On 09/13/18 11:50, Ruiyu Ni wrote: >> >>> diff --git a/MdePkg/Library/BaseSynchronizationLib/X64/GccInline.c >>> b/MdePkg/Library/BaseSynchronizationLib/X64/GccInline.c >>> index 5224dd063f..4c4d6e3fc7 100644 >>> --- a/MdePkg/Library/BaseSynchronizationLib/X64/GccInline.c >>> +++ b/MdePkg/Library/BaseSynchronizationLib/X64/GccInline.c >>> @@ -15,14 +15,12 @@ >>> >>> >>> >>> - >>> /** >>> Performs an atomic increment of an 32-bit unsigned integer. >>> >>> Performs an atomic increment of the 32-bit unsigned integer specified by >>> Value and returns the incremented value. The increment operation must be >>> - performed using MP safe mechanisms. The state of the return value is not >>> - guaranteed to be MP safe. >>> + performed using MP safe mechanisms. >>> >>> @param Value A pointer to the 32-bit value to increment. >>> >>> @@ -38,9 +36,10 @@ InternalSyncIncrement ( >>> UINT32 Result; >>> >>> __asm__ __volatile__ ( >>> + "movl $1, %%eax \n\t" >>> "lock \n\t" >>> - "incl %2 \n\t" >>> - "mov %2, %%eax " >>> + "xadd %%eax, %2 \n\t" >>> + "inc %%eax " >>> : "=a" (Result), // %0 >>> "=m" (*Value) // %1 >>> : "m" (*Value) // %2 >> >> The operand list for the inline assembly is incorrect, I suspect. When I >> build this code with GCC48 as part of OVMF, for the NOOPT target, then >> disassemble "CpuMpPei.debug", I get: >> >>> 0000000000004383 <InternalSyncIncrement>: >>> UINT32 >>> EFIAPI >>> InternalSyncIncrement ( >>> IN volatile UINT32 *Value >>> ) >>> { >>> 4383: 55 push %rbp >>> 4384: 48 89 e5 mov %rsp,%rbp >>> 4387: 48 83 ec 10 sub $0x10,%rsp >>> 438b: 48 89 4d 10 mov %rcx,0x10(%rbp) >>> UINT32 Result; >>> >>> __asm__ __volatile__ ( >>> 438f: 48 8b 55 10 mov 0x10(%rbp),%rdx >>> 4393: 48 8b 45 10 mov 0x10(%rbp),%rax >>> 4397: b8 01 00 00 00 mov $0x1,%eax >>> 439c: f0 0f c1 00 lock xadd %eax,(%rax) >>> 43a0: ff c0 inc %eax >>> 43a2: 89 45 fc mov %eax,-0x4(%rbp) >>> : "m" (*Value) // %2 >>> : "memory", >>> "cc" >>> ); >>> >>> return Result; >>> 43a5: 8b 45 fc mov -0x4(%rbp),%eax >>> } >>> 43a8: c9 leaveq >>> 43a9: c3 retq >>> >> >> The generated code is trying to use EAX for both (a) the >> exchange-and-add, i.e. as the source operand of the XADD instruction, >> and (b) as (part of) the address of the (*Value) argument. >> >> In effect, before we reach the XADD instruction, the least significant >> dword of Value's address is overwritten with 0x0000_0001. The exchange >> then happens against that memory location. >> >> It is interesting that GCC first spills the parameter from RCX to the >> stack, and then it loads the parameter from the stack to both RDX and >> RAX. >> >> I think the operand list used with the inline assembly should be updated >> to prevent GCC from associating %2 with EAX. >> >> I'll try to experiment some more. > > The operand lists for almost all of the functions in > > - MdePkg/Library/BaseSynchronizationLib/Ia32/GccInline.c > - MdePkg/Library/BaseSynchronizationLib/X64/GccInline.c > > are wrong. The InternalSyncIncrement() and InternalSyncDecrement() issue > has now bitten us in practice. The InternalSyncCompareExchangeXx() > functions also have incorrect operand lists, they just haven't blown up yet. > > The core problem is that input-output operands for the inline assembly > templates must be marked as such. The GCC documentation describes two > ways for that: > > - List the r/w operand only in the output operand list, and use the "+" > (not "=") constraint. > > - Alternatively, list the operand in both input and output lists, and > use the "=" constraint in the output list. However, in the input list, > the constraing must be "N" (a decimal number), where N stands for the > operand number in the output list. This is how GCC knows those operands > are actually the same. Using just the same *expression* to fill in the > two instances (= input and output) of the operand is not guaranteed to > work; the gcc documentation explicitly highlights this fact. > > In addition, in the ASM template, operands can be referenced by "[name]" > instead of %N. This has been supported since at least gcc-3.1. This > makes the template a lot more readable. > > I'm working on a patch set.
Apologies for conversing myself, but now I've managed to review more background discussion from the mailing list. In particular, in the following messages: 4A89E2EF3DFEDB4C8BFDE51014F606A14E2F666F@SHSMSX104.ccr.corp.intel.com">http://mid.mail-archive.com/4A89E2EF3DFEDB4C8BFDE51014F606A14E2F666F@SHSMSX104.ccr.corp.intel.com E92EE9817A31E24EB0585FDF735412F5B8AD687E@ORSMSX113.amr.corp.intel.com">http://mid.mail-archive.com/E92EE9817A31E24EB0585FDF735412F5B8AD687E@ORSMSX113.amr.corp.intel.com an agreement was reached that ultimately *two* implementations would remain: compiler intrinsics for the MSFT toolchain family, and the NASM implementation for *everything else*, including *both* GCC and INTEL toolchain families. So, why was version 3 of the patch then accepted and pushed? Because now we still have *three* implementations (excerpt): [Sources.X64] InterlockedIncrementMsc.c | MSFT InterlockedDecrementMsc.c | MSFT X64/InterlockedDecrement.nasm | INTEL X64/InterlockedIncrement.nasm | INTEL X64/GccInline.c | GCC This is not in the spirit of the agreement on the list. Why this is relevant to me: because now I realize I shouldn't *fix* the GCC inline assembly. Instead, we should *remove* it, and consume the NASM implementation. So here's what I'm going to do. I will submit a standalone, "surgical" patch, for fixing the regression introduced in 17634d026f96. Additionally, I will file a TianoCore BZ about the *other* bugs in GccInline.c, and suggest that we fix those issues by eliminating the GCC-specific variant altogether. This way the regression is going to be averted while we discuss that BZ. Thanks Laszlo _______________________________________________ edk2-devel mailing list edk2-devel@lists.01.org https://lists.01.org/mailman/listinfo/edk2-devel