On 02/23/16 14:27, Ard Biesheuvel wrote:
> On 23 February 2016 at 13:58, Laszlo Ersek <[email protected]> wrote:
>> On 02/23/16 11:42, Yao, Jiewen wrote:
>>> Laszlo
>>>
>>> I do not understand why you only set 4G page table for X64 PEI.
>>
> 
> Actually, that question is not even relevant. The reality is simply
> that the value of PcdDxeIplSwitchToLongMode is not sufficient to
> decide whether PEI will have access to all of memory, and we need some
> other way to convey the PEI memory limit. Note that, if OVMF
> implements it that way, there may be other platforms in the wild that
> do the same, so 'fixing' OVMF does not help us at all.

Thanks for pointing this out.

> So we are back to a PCD that sets a memory limit for PEI. I suppose
> having a simple feature PCD should be sufficient here? Any suggestions
> for names?

PcdFavor64BitPeiMemory?

Thanks
Laszlo


> 
> 
> 
> 
>> Two reasons.
>>
>> * First, OVMF's PEI has minimal memory needs. The permanent PEI RAM we
>> install it is normally capped at 64MB. If the guest has large RAM, then
>> we install more PEI RAM, but only so that the DXE IPL can build page
>> tables for the entirety of the guest RAM. (In other words, in such cases
>> the permanent PEI RAM size is dictated by / dominated by the DXE IPL's
>> page table allocation.)
>>
>> * Second, OVMF is special in that it doesn't need to configure DRAM
>> hardware before it becomes usable. Therefore only the reset vector and
>> the SEC phase run from flash. SEC decompresses both PEI and DXE firmware
>> volumes into normal RAM, and PEI runs from RAM.
>>
>> In order to make this possible, originally OVMF's earliest page tables
>> were pre-built by the build process -- those page tables were statically
>> embedded into the read-only flash image, and SEC would only point CR3 at
>> them. These page tables mapped RAM up to 4GB.
>>
>> Later this broke on some KVM versions: even if we pre-set the Accessed
>> and Dirty bits in the statically pre-built page table entries, the
>> processor wanted to rewrite those PTEs (in read-only flash) when
>> accessing the mapped pages themselves. Since the PTEs existed in
>> read-only memory, this broke. (I don't recall the exact symptoms, but it
>> didn't work.)
>>
>> So Jordan rewrote the creation of the earliest page tables, for the X64
>> SEC/PEI case: it is now being done in the reset vector code. The reset
>> vector code and SEC continue to run from flash (located just below 4GB).
>> The reset vector creates the page tables (mapping memory up to 4GB)
>> dynamically, located in real RAM, at 8MB. (The size of the page tables
>> is 24KB.)
>>
>> The SEC phase is entered with paging enabled (using the tables created
>> above), and those same tables are used in PEI as well. (Until the DXE
>> IPL maps the full memory.)
>>
>>> But if X64 PEI only fills 4G page table by default, you can setup page
>>> fault handler to dynamic fix the address.
>>>
>>> That is the way how we handle full memory access in memory constrain
>>> environment, like SMM.
>>
>> I'm certain this is technically solvable, but let's make it clear that
>> the burden to make OVMF cope with such a change is not on us.
>>
>> OvmfPkg lives in the upstream edk2 repository. In any open source
>> project, in-tree client code that depends on central infrastructure is
>> not left behind when the infrastructure changes, and the burden of
>> adapting the client code to the infrastructure is also not on the
>> maintainers of the client code. Whoever is updating the infrastructure
>> is responsible for bringing forward all dependent client code, with a
>> bisectable series.
>>
>>> Also, this platform can choose not report above 4G memory as tested for
>>> DxeCore
>>
>> OVMF's PlatformPei already reports >=4GB memory as untested.
>>
>>> and report big enough ACPI/Reserved memory in
>>> MemoryTypeInformation table.
>>
>> How big is big enough?
>>
>>> Then DxeCore will allocate BIN below 4G and all ACPI/Reserved memory
>>> allocation will happen below 4G.
>>
>> Honestly, I find this way more complicated than a new PCD. A PCD is
>> explicit about its purpose.
>>
>> Whereas your suggestion, if I understand it correctly, would require us
>> to measure actual memory usage for the Memory Type Information HOB, and
>> keep it up-to-date as development progresses, new drivers are pulled in,
>> old drivers are modified, and so on.
>>
>> Also, the same OVMF binary is expected to run on QEMU when S3 support is
>> disabled (on the QEMU command line). In this case, some of the
>> foundational S3-related protocols are not installed, which prevents the
>> rest of the S3-related drivers from working. This is intentional, and it
>> impacts the amount of ACPI / Reserved memory needed.
>>
>>> We do not want to introduce a new PCD, because it is unnecessary burden
>>> to maintain a new one, especially we already have solution to handle
>>> most cases.
>>
>> The way I see it, support for a new platform family is being introduced.
>> A platform family that invalidates some of the key assumptions held (and
>> hard-coded!) in a number of core modules. I don't think a preexistent,
>> well-working platform with a non-negligible user base should be broken
>> for the newcomer's sake.
>>
>> PCDs exist for platform configuration. That's what the name says:
>> Platform Configuration Database. If a new platform is being accommodated
>> in the core modules, I think it is quite natural to reckon with some
>> additional customization and maintenance.
>>
>> For example, I would think the following library instances present some
>> additional maintenance burden too:
>>
>> - MdePkg/Library/BaseCpuLib/AArch64
>> - MdePkg/Library/BaseLib/AArch64
>> - MdePkg/Library/BaseSynchronizationLib/AArch64
>>
>> I don't debate your claim that a new PCD presents new maintenance burden
>> -- it does. I do question your claim that the burden is unnecessary.
>>
>> I've put insane amounts of work into OVMF's S3. If someone submits
>> bisectable, regression-conscious patches for OVMF that adapt OVMF to the
>> proposed repurposing of PcdDxeIplSwitchToLongMode, I will do my best to
>> test and review those patches. If no such patches will be submitted, I
>> will continue NACKing the repurposing of PcdDxeIplSwitchToLongMode.
>>
>> (Obviously, I'm also open to arguments that prove me wrong.)
>>
>> Thanks
>> Laszlo
>>
>>>
>>>
>>>
>>> Thank you
>>>
>>> Yao Jiewen
>>>
>>>
>>>
>>> *From:*Laszlo Ersek [mailto:[email protected]]
>>> *Sent:* Tuesday, February 23, 2016 5:04 PM
>>> *To:* Yao, Jiewen; Ard Biesheuvel
>>> *Cc:* Ni, Ruiyu; [email protected]; Tian, Feng; Zeng, Star;
>>> [email protected]; [email protected]; Fan, Jeff; Gao, Liming
>>> *Subject:* Re: [edk2] [PATCH v3 2/4] IntelFrameworkModulePkg: BdsDxe:
>>> only allocate below 4 GB if needed
>>>
>>>
>>>
>>> On 02/23/16 05:31, Yao, Jiewen wrote:
>>>> I discussed with Module owner. Below is summary for MdeModulePkg:
>>>>
>>>> 1) PCD Name: We suggest to use PcdDxeIplSwitchToLongMode as
>>>> indicator that PEI is 32 bit and DXE is 64 bit for X86 platform. It
>>>> might be not clear stated in PCD description, but it is the intent
>>>> when this is introduced. We suggest to define this to be FALSE for
>>>> ARM, so that our core module can use it. (Confirmed with Liming Gao)
>>>>
>>>> 2) AcpiTableDxe: As Star Zeng mentioned, we need make sure FACS is
>>>> allocate below 4G when PcdDxeIplSwitchToLongMode is set, because it is
>>>> used by S3Resume.
>>>
>>> This is not good enough. PEI might be restricted to the first 4GB of
>>> memory *even though* it is built for X64.
>>>
>>> Consider a platform with the following characteristics:
>>> - PEI is X64
>>> - DXE is X64
>>> - hence PcdDxeIplSwitchToLongMode is FALSE
>>> - PEI has page tables only for the first 4GB of RAM
>>> - S3 is supported
>>>
>>> If AcpiTableDxe keyed off of PcdDxeIplSwitchToLongMode *solely*, in
>>> order to see whether the FACS needs to be allocated under 4GB, then said
>>> platform would break.
>>>
>>> The most frequently used build of OVMF is exactly like written above.
>>>
>>>>
>>>> 3) UefiBootManagerLib: As Ruiyu Ni mentioned, we need make allocate
>>>> L"PerfDataMemAddr" below 4G for when PcdDxeIplSwitchToLongMode is set,
>>>> because it is used by S3Resume.
>>>
>>> My counter-argument is the same as above, I think.
>>>
>>>>
>>>>
>>>>
>>>> 4) BootGraphicsResourceTableDxe: There is no need to add 4G
>>>> limitation, we can remove that directly. (Confirmed with Chao Zhang)
>>>>
>>>> 5) FirmwarePerformanceDataTableDxe: We can check
>>>> PcdDxeIplSwitchToLongMode to allocate Below4G memory for
>>>> mAcpiS3PerformanceTable which is used in S3.
>>>
>>> Same counter-argument again.
>>>
>>>> And we can remove 4G
>>>> limitation for mAcpiBootPerformanceTable directly. (Confirmed with
>>>> Chao
>>>> Zhang)
>>>>
>>>> 6) PiDxeS3BootScriptLib: We can check PcdDxeIplSwitchToLongMode for
>>>> S3TableBase and NewS3TableBase which is used in S3. (Confirmed with
>>>> Star
>>>> Zeng)
>>>
>>> Same counter-argument.
>>>
>>>>
>>>> 7) BootScriptExecutorDxe: We can check PcdDxeIplSwitchToLongMode for
>>>> FfsBuffer and BootScriptExecutorBuffer which is used in S3. (Confirmed
>>>> with Star Zeng)
>>>
>>> Same counter-argument.
>>>
>>>>
>>>> 8) CapsuleRuntimeDxe: Already check PcdDxeIplSwitchToLongMode.No
>>>> update needed. (Confirmed with Star Zeng)
>>>
>>> The only reason this doesn't break under OVMF is because OVMF doesn't
>>> implement capsule updates.
>>>
>>> In summary, PcdDxeIplSwitchToLongMode==TRUE implies, but is not
>>> equivalent to, the requirement to allocate under 4GB for S3 purposes. A
>>> platform may have *both* S3 *and* a 64-bit PEI phase without access to
>>>> = 4GB memory.
>>>
>>> That's why I proposed a brand new PCD. PcdDxeIplSwitchToLongMode would
>>> imply that PCD, but the new PCD wouldn't necessarily imply
>>> PcdDxeIplSwitchToLongMode.
>>>
>>>> BTW: We are trying not use IntelFrameworkModulePkg. If there is
>>>> replacement in MdeModulePkg, we suggest use the one in MdeModulePkg,
>>>> for example, BDS or ACPI.
>>>
>>> Absolutely, but migrating a platform from IntelFrameworkModulePkg BDS to
>>> MdeModulePkg BDS is not trivial, due to lack of documentation and
>>> examples; plus some functionality that is publicly available from
>>> GenericBdsLib (and is used by PlatformBdsLib instances) needs to be
>>> exported from the new boot manager library.
>>>
>>> Thanks
>>> Laszlo
>>>
>>>>
>>>>
>>>>
>>>> Thank you
>>>>
>>>> Yao Jiewen
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *From:*Ard Biesheuvel [mailto:[email protected]]
>>>> *Sent:* Monday, February 22, 2016 8:43 PM
>>>> *To:* Yao, Jiewen
>>>> *Cc:* Ni, Ruiyu; Laszlo Ersek; [email protected] 
>>>> <mailto:[email protected]>; Tian, Feng;
>>>> Zeng, Star; [email protected] <mailto:[email protected]>;
>>> [email protected] <mailto:[email protected]>; Fan,
>>>> Jeff; Gao, Liming
>>>> *Subject:* Re: [edk2] [PATCH v3 2/4] IntelFrameworkModulePkg: BdsDxe:
>>>> only allocate below 4 GB if needed
>>>>
>>>>
>>>>
>>>> On 22 February 2016 at 13:40, Yao, Jiewen wrote:
>>>>> I did a search on current MdeModulePkg. I found there are more
>>>>> modules allocating Below4G memory.
>>>>>
>>>>> Besides BDS, we have BootGraphicsResourceTableDxe,
>>>>> BootScriptExecutorDxe, FirmwarePerformanceDataTableDxe, 
>>>>> PiDxeS3BootScriptLib, CapsuleRuntimeDxe.
>>>>>
>>>>>
>>>>>
>>>>> BootGraphicsResourceTableDxe – I am not clear.
>>>>>
>>>>> CapsuleRuntimeDxe – Below4G is designed for PEI capsule. Or we do not
>>>>> need such capability.
>>>>>
>>>>> BootScriptExecutorDxe, FirmwarePerformanceDataTableDxe,
>>>>> PiDxeS3BootScriptLib
>>>>> - data need to be access in PEI phase.
>>>>>
>>>>>
>>>>>
>>>>> I think we might need more clean up for 32bit PEI.
>>>>>
>>>>
>>>> Agreed.
>>>>
>>>>> I am still thinking if we can reuse old Pcd – we can clarify in PCD
>>>>> description – it is only for 32bit PEI and 64bit DXE.
>>>>>
>>>>> Or a new PCD, and name to Pcd32BitPei? Please allow me collect more
>>>>> feedback on that.
>>>>>
>>>>
>>>> OK
>>>>
>>>>> In order to separate 32bit PEI issue from ACPI issue, I think we can
>>>>> check in ACPI patch at first, if you want.
>>>>>
>>>>
>>>> Should I wait for the maintainers of MdeModulePkg to give their
>>>> Reviewed-by ?
>>>>
>>>
>>

_______________________________________________
edk2-devel mailing list
[email protected]
https://lists.01.org/mailman/listinfo/edk2-devel

Reply via email to