On 11/09/17 22:11, Ard Biesheuvel wrote:
> On 7 November 2017 at 18:13, Ard Biesheuvel <ard.biesheu...@linaro.org> wrote:
>> On 7 November 2017 at 18:09, Laszlo Ersek <ler...@redhat.com> wrote:
>>> On 11/05/17 17:29, Ard Biesheuvel wrote:
>>>> On 5 November 2017 at 16:27, Ard Biesheuvel <ard.biesheu...@linaro.org> 
>>>> wrote:
>>>>> On 5 November 2017 at 05:52, Leif Lindholm <leif.lindh...@linaro.org> 
>>>>> wrote:
>>>>>> On Fri, Nov 03, 2017 at 11:33:52AM +0000, Ard Biesheuvel wrote:
>>>>>>> DEBUG builds of PEI code will print a diagnostic message regarding
>>>>>>> the utilization of temporary RAM before switching to permanent RAM.
>>>>>>> For example,
>>>>>>>
>>>>>>>   Total temporary memory:    16352 bytes.
>>>>>>>     temporary memory stack ever used:       4820 bytes.
>>>>>>>     temporary memory heap used for HobList: 4720 bytes.
>>>>>>>
>>>>>>> Tracking stack utilization like this requires the stack to be seeded
>>>>>>> with a known magic value, and this needs to occur before entering C
>>>>>>> code, given that it uses the stack. Currently, only Nt32Pkg appears
>>>>>>> to implement this feature, but it is useful nonetheless, so let's
>>>>>>> wire it up for PrePeiCore as well.
>>>>>>>
>>>>>>> Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=748
>>>>>>> Contributed-under: TianoCore Contribution Agreement 1.1
>>>>>>> Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org>
>>>>>>
>>>>>> OK, this may sound completely unreasonable, but seeing those
>>>>>> implementations overwrite callee-saved registers without saving them
>>>>>> makes my brain unhappy. (Yes, I know.)
>>>>>>
>>>>>> Could they either:
>>>>>> - Have a comment prepended establishing the implicit ABI of which
>>>>>>   registers the caller cannot rely on reusing after return.
>>>>>>   Preferably somewhat echoed at the call site.
>>>>>> - Be rewritten to use only scratch registers?
>>>>>>
>>>>>
>>>>> I think it is implied that the startup code does not adhere to the
>>>>> AAPCS. That code already uses r5 and r6 without stacking them, simply
>>>>> because we're in the middle of preparing the stack and other execution
>>>>> context, precisely so the C code we call into can rely on AAPCS
>>>>> guarantees.
>>>>
>>>>
>>>> Ehm, hold on, what do you mean by 'call site'? This code just runs and
>>>> jumps back to a local label. There are no functions calls here until
>>>> the point where we call into C (with the exception of the lovely
>>>> ArmPlatformPeiBootAction() we added so Juno can find out how much DRAM
>>>> it can use)
>>>
>>> Please continue the discussion with Leif on this; from my side, I'm
>>> happy with the patch (I've sort of deduced what the assembly code does,
>>> also relying on your v1 notes).
>>>
>>> The only eyebrow-raising part was:
>>>
>>> +  MOV64 (x9, FixedPcdGet32 (PcdInitValueInTempStack) |\
>>> +             FixedPcdGet32 (PcdInitValueInTempStack) << 32)
>>>
>>> where we left-shift a constant that is "in theory" UINT32 by 32 binary
>>> places, using the << operator. In C that would be undefined behavior,
>>> but this is assembly, so what do I know? ¯\_(ツ)_/¯
>>>
>>> Acked-by: Laszlo Ersek <ler...@redhat.com>
>>>
>>
>> Thanks. And you're right, this is not C so no need to worry about that.
>>
>>> (
>>>
>>> By the way, just to see if I remember correctly, isn't STP:
>>>
>>> +0:stp   x9, x9, [x8], #16
>>>
>>> the kind of instruction that modifies multiple operands at once, and so
>>> if it faults, it cannot be virtualized well? (Because the syndrome
>>> register or whatever does not tell the VMM the whole picture about the
>>> fault?)
>>>
>>> Totally irrelevant here, I'm just curious.
>>>
>>
>> STP == STore Pair, and so it stores the values in the registers to
>> memory. The only register that gets modified here is x8, due to the
>> post-increment.
>>
> 
> ... which actually doesn't mean it is not affected by the same issue.
> 
> The reason such instructions are more difficult to virtualize is that
> it requires KVM to decode the instruction, rather than read the
> syndrome registers that can tell it which register we intended to
> read/write from. So it is in fact perfectly feasible to virtualize it,
> but the KVM authors just haven't bothered yet.

Hm, I'm slightly curious if and how this differs from x86 KVM :) In x86
KVM there are huge instruction tables for emulation etc.

Anyway I'm happy this patch is now committed!

Thanks!
Laszlo

> 
>> But its converse
>>
>> LDP  <reg>, <reg>, [<reg>], #<const>
>>
>> is indeed such an instruction, given that it modifies three registers
>> at once, and so the registers that encode the exception run out of
>> space. Note that this only affects virtualized MMIO.

_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel

Reply via email to