On 09/25/18 18:18, Laszlo Ersek wrote:
> On 09/25/18 16:22, Laszlo Ersek wrote:
>> On 09/13/18 11:50, Ruiyu Ni wrote:
>>
>>> diff --git a/MdePkg/Library/BaseSynchronizationLib/X64/GccInline.c 
>>> b/MdePkg/Library/BaseSynchronizationLib/X64/GccInline.c
>>> index 5224dd063f..4c4d6e3fc7 100644
>>> --- a/MdePkg/Library/BaseSynchronizationLib/X64/GccInline.c
>>> +++ b/MdePkg/Library/BaseSynchronizationLib/X64/GccInline.c
>>> @@ -15,14 +15,12 @@
>>>
>>>
>>>
>>> -
>>>  /**
>>>    Performs an atomic increment of an 32-bit unsigned integer.
>>>
>>>    Performs an atomic increment of the 32-bit unsigned integer specified by
>>>    Value and returns the incremented value. The increment operation must be
>>> -  performed using MP safe mechanisms. The state of the return value is not
>>> -  guaranteed to be MP safe.
>>> +  performed using MP safe mechanisms.
>>>
>>>    @param  Value A pointer to the 32-bit value to increment.
>>>
>>> @@ -38,9 +36,10 @@ InternalSyncIncrement (
>>>    UINT32  Result;
>>>
>>>    __asm__ __volatile__ (
>>> +    "movl    $1, %%eax  \n\t"
>>>      "lock               \n\t"
>>> -    "incl    %2         \n\t"
>>> -    "mov     %2, %%eax      "
>>> +    "xadd    %%eax, %2  \n\t"
>>> +    "inc     %%eax          "
>>>      : "=a" (Result),          // %0
>>>        "=m" (*Value)           // %1
>>>      : "m"  (*Value)           // %2
>>
>> The operand list for the inline assembly is incorrect, I suspect. When I
>> build this code with GCC48 as part of OVMF, for the NOOPT target, then
>> disassemble "CpuMpPei.debug", I get:
>>
>>> 0000000000004383 <InternalSyncIncrement>:
>>> UINT32
>>> EFIAPI
>>> InternalSyncIncrement (
>>>   IN      volatile UINT32    *Value
>>>   )
>>> {
>>>     4383:       55                      push   %rbp
>>>     4384:       48 89 e5                mov    %rsp,%rbp
>>>     4387:       48 83 ec 10             sub    $0x10,%rsp
>>>     438b:       48 89 4d 10             mov    %rcx,0x10(%rbp)
>>>   UINT32  Result;
>>>
>>>   __asm__ __volatile__ (
>>>     438f:       48 8b 55 10             mov    0x10(%rbp),%rdx
>>>     4393:       48 8b 45 10             mov    0x10(%rbp),%rax
>>>     4397:       b8 01 00 00 00          mov    $0x1,%eax
>>>     439c:       f0 0f c1 00             lock xadd %eax,(%rax)
>>>     43a0:       ff c0                   inc    %eax
>>>     43a2:       89 45 fc                mov    %eax,-0x4(%rbp)
>>>     : "m"  (*Value)           // %2
>>>     : "memory",
>>>       "cc"
>>>     );
>>>
>>>   return Result;
>>>     43a5:       8b 45 fc                mov    -0x4(%rbp),%eax
>>> }
>>>     43a8:       c9                      leaveq
>>>     43a9:       c3                      retq
>>>
>>
>> The generated code is trying to use EAX for both (a) the
>> exchange-and-add, i.e.  as the source operand of the XADD instruction,
>> and (b) as (part of) the address of the (*Value) argument.
>>
>> In effect, before we reach the XADD instruction, the least significant
>> dword of Value's address is overwritten with 0x0000_0001. The exchange
>> then happens against that memory location.
>>
>> It is interesting that GCC first spills the parameter from RCX to the
>> stack, and then it loads the parameter from the stack to both RDX and
>> RAX.
>>
>> I think the operand list used with the inline assembly should be updated
>> to prevent GCC from associating %2 with EAX.
>>
>> I'll try to experiment some more.
> 
> The operand lists for almost all of the functions in
> 
> - MdePkg/Library/BaseSynchronizationLib/Ia32/GccInline.c
> - MdePkg/Library/BaseSynchronizationLib/X64/GccInline.c
> 
> are wrong. The InternalSyncIncrement() and InternalSyncDecrement() issue
> has now bitten us in practice. The InternalSyncCompareExchangeXx()
> functions also have incorrect operand lists, they just haven't blown up yet.
> 
> The core problem is that input-output operands for the inline assembly
> templates must be marked as such. The GCC documentation describes two
> ways for that:
> 
> - List the r/w operand only in the output operand list, and use the "+"
> (not "=") constraint.
> 
> - Alternatively, list the operand in both input and output lists, and
> use the "=" constraint in the output list. However, in the input list,
> the constraing must be "N" (a decimal number), where N stands for the
> operand number in the output list. This is how GCC knows those operands
> are actually the same. Using just the same *expression* to fill in the
> two instances (= input and output) of the operand is not guaranteed to
> work; the gcc documentation explicitly highlights this fact.
> 
> In addition, in the ASM template, operands can be referenced by "[name]"
> instead of %N. This has been supported since at least gcc-3.1. This
> makes the template a lot more readable.
> 
> I'm working on a patch set.

Apologies for conversing myself, but now I've managed to review more
background discussion from the mailing list.

In particular, in the following messages:

4A89E2EF3DFEDB4C8BFDE51014F606A14E2F666F@SHSMSX104.ccr.corp.intel.com">http://mid.mail-archive.com/4A89E2EF3DFEDB4C8BFDE51014F606A14E2F666F@SHSMSX104.ccr.corp.intel.com

E92EE9817A31E24EB0585FDF735412F5B8AD687E@ORSMSX113.amr.corp.intel.com">http://mid.mail-archive.com/E92EE9817A31E24EB0585FDF735412F5B8AD687E@ORSMSX113.amr.corp.intel.com

an agreement was reached that ultimately *two* implementations would
remain: compiler intrinsics for the MSFT toolchain family, and the NASM
implementation for *everything else*, including *both* GCC and INTEL
toolchain families.

So, why was version 3 of the patch then accepted and pushed? Because now
we still have *three* implementations (excerpt):

[Sources.X64]
  InterlockedIncrementMsc.c | MSFT
  InterlockedDecrementMsc.c | MSFT

  X64/InterlockedDecrement.nasm | INTEL
  X64/InterlockedIncrement.nasm | INTEL

  X64/GccInline.c | GCC

This is not in the spirit of the agreement on the list.

Why this is relevant to me: because now I realize I shouldn't *fix* the
GCC inline assembly. Instead, we should *remove* it, and consume the
NASM implementation.

So here's what I'm going to do. I will submit a standalone, "surgical"
patch, for fixing the regression introduced in 17634d026f96.
Additionally, I will file a TianoCore BZ about the *other* bugs in
GccInline.c, and suggest that we fix those issues by eliminating the
GCC-specific variant altogether. This way the regression is going to be
averted while we discuss that BZ.

Thanks
Laszlo
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel

Reply via email to