On 08/11/17 12:03, Ard Biesheuvel wrote:
> On 11 August 2017 at 01:34, Laszlo Ersek <ler...@redhat.com> wrote:

>> (2) Unfortunately, the C-language assignment to
>>     "MP_CPU_EXCHANGE_INFO.InitializeFloatingPointUnitsAddress" is
>>     miscompiled under the following circumstances:
>>     - the edk2 build target is DEBUG (or RELEASE),
>>     - the edk2 toolchain selection is GCC5,
>>     - (from the above two it follows that -Os and LTO are enabled,)
>>     - MpInitLib is built for X64,
>>     - and the compiler is gcc-7.1 (shipped in Fedora-26 e.g.).
>>     (If the PEI phase is 64-bit, then MpInitLib breaks in CpuMpPei. If the
>>     PEI phase is 32-bit and the DXE phase is 64-bit, then MpInitLib breaks
>>     in CpuDxe. If the DXE phase is 32-bit, then nothing breaks.)
>>     Below I'll use the NOOPT/GCC5/X64/gcc-7.1 build of CpuMpPei to show
>>     the correct assembly code generated by gcc for the assignment, and
>>     flip the build target to DEBUG for showing the broken assembly code.
>> (3) Correct code for the assignment:
>>>     5406:       48 8d 15 73 24 00 00    lea    0x2473(%rip),%rdx        # 
>>> 7880 <InitializeFloatingPointUnits>
>>>     540d:       48 8b 45 f8             mov    -0x8(%rbp),%rax
>>>     5411:       48 89 90 84 00 00 00    mov    %rdx,0x84(%rax)
>>     Here the absolute address of InitializeFloatingPointUnits() is loaded
>>     into RDX with RIP-relative addressing. Then RAX is pointed to the
>>     ExchangeInfo structure, and finally RDX is stored into
>>     ExchangeInfo->InitializeFloatingPointUnitsAddress (the field offset is
>>     0x84 in the structure).
>>     Here the only relocation happens when CpuMpPei is linked. The distance
>>     between the call site and the callee is calculated and encoded in the
>>     LEA instruction. The same distance is valid when CpuMpPei runs, so the
>>     PEI core doesn't have to relocate the LEA instruction when it loads
>>     CpuMpPei.
>> (4) Incorrect code for the assignment:
>>>     2d69:       48 c7 c2 60 58 00 00    mov    $0x5860,%rdx
>>>     ...
>>>     2d7b:       48 89 95 84 00 00 00    mov    %rdx,0x84(%rbp)
>>>     ...
>>> 0000000000005860 <InitializeFloatingPointUnits>:
>>>     5860:       9b db e3                finit
>>     This is not RIP-relative addressing. Therefore gcc generates a
>>     relocation record for the address encoded in the MOV instruction, at
>>     offset 0x2d69+0x3 -- initial value being 0x5860:
>>> Sections:
>>> Idx Name          Size      VMA               LMA               File off  
>>> Algn
>>>   0 .text         00006e66  0000000000000240  0000000000000240  000000c0  
>>> 2**6
>>> ...
>>> OFFSET           TYPE              VALUE
>>> ...
>>> 0000000000002b2c R_X86_64_32S      InitializeFloatingPointUnits
>>     (The offset of the address to relocate is 0x2d69+3 == 0x2d6c, which
>>     equals the .text section base plus the relocation offset, i.e.,
>>     0x240+0x2b2c.)
>>     This is the only R_X86_64_32S relocation record in the linked-together
>>     CpuMpPei.debug file (still an ELF file). In the rest of the build
>>     process, the GenFw utility translates CpuMpPei.dll to a PE/COFF image
>>     (CpuMpPei.efi), and converts the ELF relocation record to a PE/COFF
>>     relocation record (of type EFI_IMAGE_REL_BASED_HIGHLOW).

> Thanks for the interesting read. One question: is all code potentially
> miscompiled? Or does it only affect code executing in real mode?

The code that is miscompiled is a normal assignment, in normal C code.
The structure type itself is packed [1], and the pointer to access it is
volatile-qualified [2], so those could be contributing factors.

[1] UefiCpuPkg/Library/MpInitLib/MpLib.h

#pragma pack(1)

// MP CPU exchange information for AP reset code
// This structure is required to be packed because fixed field offsets
// into this structure are used in assembly code in this module
typedef struct {
  UINTN                 Lock;
  UINTN                 StackStart;
  UINTN                 StackSize;
  UINTN                 CFunction;
  IA32_DESCRIPTOR       GdtrProfile;
  IA32_DESCRIPTOR       IdtrProfile;
  UINTN                 BufferStart;
  UINTN                 ModeOffset;
  UINTN                 NumApsExecuting;
  UINTN                 CodeSegment;
  UINTN                 DataSegment;
  UINTN                 EnableExecuteDisable;
  UINTN                 Cr3;
  UINTN                 InitFlag;
  CPU_INFO_IN_HOB       *CpuInfo;
  CPU_MP_DATA           *CpuMpData;
  UINTN                 InitializeFloatingPointUnitsAddress;

#pragma pack()

[2] UefiCpuPkg/Library/MpInitLib/MpLib.c

FillExchangeInfoData (
  IN CPU_MP_DATA               *CpuMpData
  volatile MP_CPU_EXCHANGE_INFO    *ExchangeInfo;
  ExchangeInfo->InitializeFloatingPointUnitsAddress =

The only reason the miscompilation affects real mode is that the
assignment populates an "exchange area field" that is consumed by the
application processors from real mode. To be clear, the issue happens on
the population (assignment) side, in normal C code, not on the
consumption side -- the latter is only the victim.

I think *technically* speaking gcc is entitled to generate such an
instruction, because with -mcmodel=small, we promise it that we're never
going to run the binary above 2GB in the address space. And, we do break
that promise, when the PEI core loads the PEIM just below 3GB. What's
interesting is that adding -fpie on top of -mcmodel=small, safety could
be restored (as explained in your commit message) under all gcc's prior
to gcc-7. For some reason, "-fpie" is not enough with gcc-7 + LTO, and
it really takes us up on our "-mcmodel=small" word (about not loading
executables above 2GB).

So, this is a behavioral change in gcc-7, but I'm unsure if we should
call it "gcc bug" or "build flag misconfiguration".

edk2-devel mailing list

Reply via email to