On 11/09/17 22:11, Ard Biesheuvel wrote: > On 7 November 2017 at 18:13, Ard Biesheuvel <ard.biesheu...@linaro.org> wrote: >> On 7 November 2017 at 18:09, Laszlo Ersek <ler...@redhat.com> wrote: >>> On 11/05/17 17:29, Ard Biesheuvel wrote: >>>> On 5 November 2017 at 16:27, Ard Biesheuvel <ard.biesheu...@linaro.org> >>>> wrote: >>>>> On 5 November 2017 at 05:52, Leif Lindholm <leif.lindh...@linaro.org> >>>>> wrote: >>>>>> On Fri, Nov 03, 2017 at 11:33:52AM +0000, Ard Biesheuvel wrote: >>>>>>> DEBUG builds of PEI code will print a diagnostic message regarding >>>>>>> the utilization of temporary RAM before switching to permanent RAM. >>>>>>> For example, >>>>>>> >>>>>>> Total temporary memory: 16352 bytes. >>>>>>> temporary memory stack ever used: 4820 bytes. >>>>>>> temporary memory heap used for HobList: 4720 bytes. >>>>>>> >>>>>>> Tracking stack utilization like this requires the stack to be seeded >>>>>>> with a known magic value, and this needs to occur before entering C >>>>>>> code, given that it uses the stack. Currently, only Nt32Pkg appears >>>>>>> to implement this feature, but it is useful nonetheless, so let's >>>>>>> wire it up for PrePeiCore as well. >>>>>>> >>>>>>> Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=748 >>>>>>> Contributed-under: TianoCore Contribution Agreement 1.1 >>>>>>> Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org> >>>>>> >>>>>> OK, this may sound completely unreasonable, but seeing those >>>>>> implementations overwrite callee-saved registers without saving them >>>>>> makes my brain unhappy. (Yes, I know.) >>>>>> >>>>>> Could they either: >>>>>> - Have a comment prepended establishing the implicit ABI of which >>>>>> registers the caller cannot rely on reusing after return. >>>>>> Preferably somewhat echoed at the call site. >>>>>> - Be rewritten to use only scratch registers? >>>>>> >>>>> >>>>> I think it is implied that the startup code does not adhere to the >>>>> AAPCS. That code already uses r5 and r6 without stacking them, simply >>>>> because we're in the middle of preparing the stack and other execution >>>>> context, precisely so the C code we call into can rely on AAPCS >>>>> guarantees. >>>> >>>> >>>> Ehm, hold on, what do you mean by 'call site'? This code just runs and >>>> jumps back to a local label. There are no functions calls here until >>>> the point where we call into C (with the exception of the lovely >>>> ArmPlatformPeiBootAction() we added so Juno can find out how much DRAM >>>> it can use) >>> >>> Please continue the discussion with Leif on this; from my side, I'm >>> happy with the patch (I've sort of deduced what the assembly code does, >>> also relying on your v1 notes). >>> >>> The only eyebrow-raising part was: >>> >>> + MOV64 (x9, FixedPcdGet32 (PcdInitValueInTempStack) |\ >>> + FixedPcdGet32 (PcdInitValueInTempStack) << 32) >>> >>> where we left-shift a constant that is "in theory" UINT32 by 32 binary >>> places, using the << operator. In C that would be undefined behavior, >>> but this is assembly, so what do I know? ¯\_(ツ)_/¯ >>> >>> Acked-by: Laszlo Ersek <ler...@redhat.com> >>> >> >> Thanks. And you're right, this is not C so no need to worry about that. >> >>> ( >>> >>> By the way, just to see if I remember correctly, isn't STP: >>> >>> +0:stp x9, x9, [x8], #16 >>> >>> the kind of instruction that modifies multiple operands at once, and so >>> if it faults, it cannot be virtualized well? (Because the syndrome >>> register or whatever does not tell the VMM the whole picture about the >>> fault?) >>> >>> Totally irrelevant here, I'm just curious. >>> >> >> STP == STore Pair, and so it stores the values in the registers to >> memory. The only register that gets modified here is x8, due to the >> post-increment. >> > > ... which actually doesn't mean it is not affected by the same issue. > > The reason such instructions are more difficult to virtualize is that > it requires KVM to decode the instruction, rather than read the > syndrome registers that can tell it which register we intended to > read/write from. So it is in fact perfectly feasible to virtualize it, > but the KVM authors just haven't bothered yet.
Hm, I'm slightly curious if and how this differs from x86 KVM :) In x86 KVM there are huge instruction tables for emulation etc. Anyway I'm happy this patch is now committed! Thanks! Laszlo > >> But its converse >> >> LDP <reg>, <reg>, [<reg>], #<const> >> >> is indeed such an instruction, given that it modifies three registers >> at once, and so the registers that encode the exception run out of >> space. Note that this only affects virtualized MMIO. _______________________________________________ edk2-devel mailing list edk2-devel@lists.01.org https://lists.01.org/mailman/listinfo/edk2-devel