Ok, thanks for clarification. I just answer the last question below:
- How does the Conclusion follow from Premises 1 through 4? What guarantees it? - Premise 1: OVMF installs memory that is >= 4GB as an untested memory resource descriptor HOB. - Premise 2: OVMF creates a memory type information HOB that is out of date, and will *always* be out of date. (It is simply impossible to keep that HOB up-to-date, or to reliably detect when it becomes stale.) - Premise 3: All of the guest RAM, including >= 4GB, is available to UEFI drivers and applications. - Premise 4: The affected drivers will be allowed to allocate memory >= 4GB if PcdDxeIplSwitchToLongMode is FALSE. - Conclusion: No memory area that is accessed during PEI on the S3 resume path is allocated from >= 4GB memory during DXE and BDS on the normal boot path. [Jiewen] Premise 2 is very important. It will cause DxeCore always find a piece of memory to pre-allocate BIN. It means DxeCore will *always* find memory in BIN, to allocate ACPI/Reserved/RT memory. BDS will auto update the memory type info hob into a variable. The proper way is to get variable and sync variable data to Hob. If Premise 1 is satisfied, DxeCore will use tested Below4G memory as BIN, to pre-allocate ACPI/Reserved/RT memory. This is one of first steps in DxeCore initialization. Premise 3 is not related, because memory test driver will *test* untested memory. After that DxeCore *can* use that. It does not mean DxeCore *has to* allocate above 4G. DxeCore still looks for BIN location as first priority. Premise 4 is correct only on BS/Loader memory, because of TopDown policy. For ACPI/Reserved/RT memory, DxeCore looks for BIN location as first priorty. Because BIN is below 4G, ACPI/Reserved/RT memory is below 4G. Conclusion: since PEI S3 phase only access ACPI/Reserved memory and they are below 4G, it is safe. I completely understand the whole process is complicated, so we describe memory type information table in https://firmware.intel.com/sites/default/files/resources/A_Tour_Beyond_BIOS_Memory_Map_in%20UEFI_BIOS.pdf, “Memory Map in S4 resume” section. Please let me know if it is clear enough. If not, I can give more description. Thank you Yao Jiewen From: edk2-devel [mailto:[email protected]] On Behalf Of Laszlo Ersek Sent: Tuesday, February 23, 2016 10:42 PM To: Yao, Jiewen; Ard Biesheuvel Cc: Ni, Ruiyu; Tian, Feng; [email protected]; Justen, Jordan L; [email protected]; [email protected]; Gao, Liming; Fan, Jeff; Zeng, Star Subject: Re: [edk2] [PATCH v3 2/4] IntelFrameworkModulePkg: BdsDxe: only allocate below 4 GB if needed On 02/23/16 14:39, Yao, Jiewen wrote: > Understood. I think the 4G for PEI x64 makes sense. Thanks! > I checked OVMF code. In OvmfPkg\PlatformPei\Platform.c(44) > > EFI_MEMORY_TYPE_INFORMATION mDefaultMemoryTypeInformation[] = { > > { EfiACPIMemoryNVS, 0x004 }, > > { EfiACPIReclaimMemory, 0x008 }, > > { EfiReservedMemoryType, 0x004 }, > > { EfiRuntimeServicesData, 0x024 }, > > { EfiRuntimeServicesCode, 0x030 }, > > { EfiBootServicesCode, 0x180 }, > > { EfiBootServicesData, 0xF00 }, > > { EfiMaxMemoryType, 0x000 } > > }; > > This is memory type information I mentioned. OVMF already has it. Yes, I have been aware. In fact, after seeing Ard's commit commit c1993157bd81e5ab440cf24ef77b0b3fb325e89f Author: Ard Biesheuvel Date: Wed Jun 3 16:11:47 2015 +0000 ArmVirtPkg: increase memory preallocations to reduce region count I got inspired to do something similar for OvmfPkg myself. Ultimately I never posted the patch for OvmfPkg, because it didn't help much. Here's the commit message, from my local repo: > From b357e8d88c0304ea2b31aefafe53d06c9769fb78 Mon Sep 17 00:00:00 2001 > From: Laszlo Ersek > Date: Thu, 17 Sep 2015 16:18:46 +0200 > Subject: [PATCH] OvmfPkg: PlatformPei: decrease memmap fragmentation > > Inspired by ArmVirtPkg commit c199315 ("ArmVirtPkg: increase memory > preallocations to reduce region count"), I checked the number of > entries in the UEFI memory map, as dumped by the UEFI shell's MEMMAP > command, and by the Linux kernel. The number of entries is quite high, about > 50-55. > > I calculated the new preallocations as follows: > - added 15% to each byte count usage reported by the MEMMAP command, for > some future-proofing, > - expressed the result in kilobytes (both pages and byte counts are hard > to read), > - just for our information, I calculated the ratio between the new > preallocation and the old one. > > For example, the UEFI shell reported 44 pages (180224 bytes) of > reserved memory usage. The new preallocation, expressed in kilobytes, > is > trunc(180224 * 1.15 / 1024) = 202. This preallocation is approx. 12.62 > times the previous preallocation (which was 4 pages, ie. 16384 bytes). > > Here's the full table: > > memory type pages from bytes from new KB factor of former > MEMMAP cmd MEMMAP cmd prealloc prealloc > ----------- ---------- ---------- -------- ---------------- > Reserved 44 180224 202 12.62 > LoaderCode 313 1282048 1439 n/a > BS_Code 1300 5324800 5980 3.89 > BS_Data 9053 37081088 41643 2.71 > RT_Code 223 913408 1025 5.33 > RT_Data 789 3231744 3629 25.20 > ACPI_Recl 8 32768 36 1.12 > ACPI_NVS 283 1159168 1301 81.31 > > ... Unfortunately, when the patch is applied, the memory map remains > fragmented; mostly due to small unused Conventional Memory entries > between other types of allocations. The entry count doesn't go below 40. > > Contributed-under: TianoCore Contribution Agreement 1.0 > Signed-off-by: Laszlo Ersek > --- > OvmfPkg/PlatformPei/Platform.c | 17 +++++++++-------- > 1 file changed, 9 insertions(+), 8 deletions(-) Back to your email: On 02/23/16 14:39, Yao, Jiewen wrote: > > (Although the table itself is wrong, the BootServicesCode and > BootServicesData should be removed.) The table is completely bogus; I'm only not dropping it as-is because that's the easiest. It doesn't cause problems. > Also OvmfPkg\PlatformPeihas AddUntestedMemoryBaseSizeHob (BASE_4GB, > UpperMemorySize); > > It means above 4G is untested. Yes, just as I wrote in my previous email. > In conclusion, I am 100% sure that the ACPI allocation will still > always be Below4GB after we adopt the change, because that is > controlled by DXE CORE. (Unless there is bug, and we need fix the > bug.) I'd like to see a rigorous proof for this claim. Namely, if I boot OVMF right now, with a 6GB guest, and then enter the UEFI shell, the MEMMAP command reports the following: Reserved : 2,159 Pages (8,843,264 Bytes) LoaderCode: 320 Pages (1,310,720 Bytes) LoaderData: 0 Pages (0 Bytes) BS_Code : 1,390 Pages (5,693,440 Bytes) BS_Data : 6,324 Pages (25,903,104 Bytes) RT_Code : 97 Pages (397,312 Bytes) RT_Data : 598 Pages (2,449,408 Bytes) ACPI_Recl : 9 Pages (36,864 Bytes) ACPI_NVS : 3,111 Pages (12,742,656 Bytes) MMIO : 0 Pages (0 Bytes) MMIO_Port : 0 Pages (0 Bytes) PalCode : 0 Pages (0 Bytes) Available : 1,559,272 Pages (6,386,778,112 Bytes) The 6GB RAM is almost all free and available to UEFI applications. Where does it come from? I guess (but I am not fully sure) that this happens because the untested memory is promoted to tested memory, in the following driver: MdeModulePkg/Universal/MemoryTest/NullMemoryTestDxe/NullMemoryTest.c I think the driver is invoked through EFI_GENERIC_MEMORY_TEST_PROTOCOL, via the following call chain: BdsEntry() [IntelFrameworkModulePkg/Universal/BdsDxe/BdsEntry.c] PlatformBdsPolicyBehavior() [OvmfPkg/Library/PlatformBdsLib/BdsPlatform.c] PlatformBdsDiagnostics() [OvmfPkg/Library/PlatformBdsLib/BdsPlatform.c] BdsMemoryTest() [IntelFrameworkModulePkg/Universal/BdsDxe/MemoryTest.c] ... gEfiGenericMemTestProtocolGuid ... > Then I believe you will not see any problem on below configuration. Again, what guarantees that the allocations in question will be satisfied from < 4GB memory, despite the fact that by the time we reach the UEFI shell, all of it seems to be available? Is your argument that by the time BDS runs and invokes the Null memory test, promoting untested memory to tested memory, every single one of the relevant allocations (across all the affected drivers) will have been performed? ... If there's a detailed and/or spec-based proof for this, I think I'd be fine with that, but then the proof should be written up in one of the patches that repurpose PcdDxeIplSwitchToLongMode. And then I shall add a comment to the AddUntestedMemoryBaseSizeHob() call in OVMF's PlatformPei, so that the implications are fully clear. > >> Consider a platform with the following characteristics: >> - PEI is X64 >> - DXE is X64 >> - hence PcdDxeIplSwitchToLongMode is FALSE >> - PEI has page tables only for the first 4GB of RAM >> - S3 is supported > > I agree PCD is good way for platform to configure core module behavior. > And we also try to avoid PCD abuse. > > In this case, I think Ard gave reasonable justification for ACPI table > allocation on any memory (instead of below 4G) to avoid fragmentation. > I believe PCD is good way to resolve the problem, and I do not have > any idea on how to resolve it. That is why we choose PCD solution – > PcdAcpiTableVersion. > > 4G x64 PEI can be resolved and is already resolved without a new PCD. > So I do not see the immediate need to have a new one. I agree that introducing more than one PCD for this purpose looks overkill. But, doesn't that imply that PcdAcpiTableVersion wasn't generic enough ultimately? I mean, if we're just introducing PcdAcpiTableVersion, then *instead* of it, introduce a PCD with a more generic name that can control *all* of these allocations, across all the relevant drivers. Don't make it specific to ACPI. > I fully agree with you that any MdeModulePkg patch should still keep > OVMF work. That is Core’s responsibility to make compatible change. > > If you find 4G X64 PEI S3 is broken later, please let me know. I will > be happy to discuss with you on the solution. For now, I believe it > should work well. Again, I'd like to see rigorous proof for the following theorem: - Premise 1: OVMF installs memory that is >= 4GB as an untested memory resource descriptor HOB. - Premise 2: OVMF creates a memory type information HOB that is out of date, and will *always* be out of date. (It is simply impossible to keep that HOB up-to-date, or to reliably detect when it becomes stale.) - Premise 3: All of the guest RAM, including >= 4GB, is available to UEFI drivers and applications. - Premise 4: The affected drivers will be allowed to allocate memory >= 4GB if PcdDxeIplSwitchToLongMode is FALSE. - Conclusion: No memory area that is accessed during PEI on the S3 resume path is allocated from >= 4GB memory during DXE and BDS on the normal boot path. How does the Conclusion follow from Premises 1 through 4? What guarantees it? Thanks Laszlo > Thank you > > Yao Jiewen > > > > *From:*edk2-devel [mailto:[email protected]] *On Behalf > Of *Laszlo Ersek > *Sent:* Tuesday, February 23, 2016 8:59 PM > *To:* Yao, Jiewen; Ard Biesheuvel > *Cc:* Ni, Ruiyu; Tian, Feng; > [email protected]<mailto:[email protected]>; Justen, Jordan > L; [email protected]<mailto:[email protected]>; > [email protected]<mailto:[email protected]>; Gao, Liming; Fan, > Jeff; Zeng, Star > *Subject:* Re: [edk2] [PATCH v3 2/4] IntelFrameworkModulePkg: BdsDxe: > only allocate below 4 GB if needed > > > > On 02/23/16 11:42, Yao, Jiewen wrote: >> Laszlo >> >> I do not understand why you only set 4G page table for X64 PEI. > > Two reasons. > > * First, OVMF's PEI has minimal memory needs. The permanent PEI RAM we > install it is normally capped at 64MB. If the guest has large RAM, > then we install more PEI RAM, but only so that the DXE IPL can build > page tables for the entirety of the guest RAM. (In other words, in > such cases the permanent PEI RAM size is dictated by / dominated by > the DXE IPL's page table allocation.) > > * Second, OVMF is special in that it doesn't need to configure DRAM > hardware before it becomes usable. Therefore only the reset vector and > the SEC phase run from flash. SEC decompresses both PEI and DXE > firmware volumes into normal RAM, and PEI runs from RAM. > > In order to make this possible, originally OVMF's earliest page tables > were pre-built by the build process -- those page tables were > statically embedded into the read-only flash image, and SEC would only > point CR3 at them. These page tables mapped RAM up to 4GB. > > Later this broke on some KVM versions: even if we pre-set the Accessed > and Dirty bits in the statically pre-built page table entries, the > processor wanted to rewrite those PTEs (in read-only flash) when > accessing the mapped pages themselves. Since the PTEs existed in > read-only memory, this broke. (I don't recall the exact symptoms, but > it didn't work.) > > So Jordan rewrote the creation of the earliest page tables, for the > X64 SEC/PEI case: it is now being done in the reset vector code. The > reset vector code and SEC continue to run from flash (located just below 4GB). > The reset vector creates the page tables (mapping memory up to 4GB) > dynamically, located in real RAM, at 8MB. (The size of the page tables > is 24KB.) > > The SEC phase is entered with paging enabled (using the tables created > above), and those same tables are used in PEI as well. (Until the DXE > IPL maps the full memory.) > >> But if X64 PEI only fills 4G page table by default, you can setup >> page fault handler to dynamic fix the address. >> >> That is the way how we handle full memory access in memory constrain >> environment, like SMM. > > I'm certain this is technically solvable, but let's make it clear that > the burden to make OVMF cope with such a change is not on us. > > OvmfPkg lives in the upstream edk2 repository. In any open source > project, in-tree client code that depends on central infrastructure is > not left behind when the infrastructure changes, and the burden of > adapting the client code to the infrastructure is also not on the > maintainers of the client code. Whoever is updating the infrastructure > is responsible for bringing forward all dependent client code, with a > bisectable series. > >> Also, this platform can choose not report above 4G memory as tested >> for DxeCore > > OVMF's PlatformPei already reports >=4GB memory as untested. > >> and report big enough ACPI/Reserved memory in MemoryTypeInformation >> table. > > How big is big enough? > >> Then DxeCore will allocate BIN below 4G and all ACPI/Reserved memory >> allocation will happen below 4G. > > Honestly, I find this way more complicated than a new PCD. A PCD is > explicit about its purpose. > > Whereas your suggestion, if I understand it correctly, would require > us to measure actual memory usage for the Memory Type Information HOB, > and keep it up-to-date as development progresses, new drivers are > pulled in, old drivers are modified, and so on. > > Also, the same OVMF binary is expected to run on QEMU when S3 support > is disabled (on the QEMU command line). In this case, some of the > foundational S3-related protocols are not installed, which prevents > the rest of the S3-related drivers from working. This is intentional, > and it impacts the amount of ACPI / Reserved memory needed. > >> We do not want to introduce a new PCD, because it is unnecessary >> burden to maintain a new one, especially we already have solution to >> handle most cases. > > The way I see it, support for a new platform family is being introduced. > A platform family that invalidates some of the key assumptions held > (and > hard-coded!) in a number of core modules. I don't think a preexistent, > well-working platform with a non-negligible user base should be broken > for the newcomer's sake. > > PCDs exist for platform configuration. That's what the name says: > Platform Configuration Database. If a new platform is being > accommodated in the core modules, I think it is quite natural to > reckon with some additional customization and maintenance. > > For example, I would think the following library instances present > some additional maintenance burden too: > > - MdePkg/Library/BaseCpuLib/AArch64 > - MdePkg/Library/BaseLib/AArch64 > - MdePkg/Library/BaseSynchronizationLib/AArch64 > > I don't debate your claim that a new PCD presents new maintenance > burden > -- it does. I do question your claim that the burden is unnecessary. > > I've put insane amounts of work into OVMF's S3. If someone submits > bisectable, regression-conscious patches for OVMF that adapt OVMF to > the proposed repurposing of PcdDxeIplSwitchToLongMode, I will do my > best to test and review those patches. If no such patches will be > submitted, I will continue NACKing the repurposing of > PcdDxeIplSwitchToLongMode. > > (Obviously, I'm also open to arguments that prove me wrong.) > > Thanks > Laszlo > >> >> >> >> Thank you >> >> Yao Jiewen >> >> >> >> *From:*Laszlo Ersek [mailto:[email protected]] >> *Sent:* Tuesday, February 23, 2016 5:04 PM >> *To:* Yao, Jiewen; Ard Biesheuvel >> *Cc:* Ni, Ruiyu; [email protected]<mailto:[email protected]> >> ; Tian, Feng; > Zeng, Star; >> [email protected]<mailto:[email protected]> ; > [email protected]<mailto:[email protected]> ; Fan, > Jeff; Gao, >> Liming >> *Subject:* Re: [edk2] [PATCH v3 2/4] IntelFrameworkModulePkg: BdsDxe: >> only allocate below 4 GB if needed >> >> >> >> On 02/23/16 05:31, Yao, Jiewen wrote: >>> I discussed with Module owner. Below is summary for MdeModulePkg: >>> >>> 1) PCD Name: We suggest to use PcdDxeIplSwitchToLongMode as >>> indicator that PEI is 32 bit and DXE is 64 bit for X86 platform. It >>> might be not clear stated in PCD description, but it is the intent >>> when this is introduced. We suggest to define this to be FALSE for >>> ARM, so that our core module can use it. (Confirmed with Liming Gao) >>> >>> 2) AcpiTableDxe: As Star Zeng mentioned, we need make sure FACS is >>> allocate below 4G when PcdDxeIplSwitchToLongMode is set, because it >>> is used by S3Resume. >> >> This is not good enough. PEI might be restricted to the first 4GB of >> memory *even though* it is built for X64. >> >> Consider a platform with the following characteristics: >> - PEI is X64 >> - DXE is X64 >> - hence PcdDxeIplSwitchToLongMode is FALSE >> - PEI has page tables only for the first 4GB of RAM >> - S3 is supported >> >> If AcpiTableDxe keyed off of PcdDxeIplSwitchToLongMode *solely*, in >> order to see whether the FACS needs to be allocated under 4GB, then >> said platform would break. >> >> The most frequently used build of OVMF is exactly like written above. >> >>> >>> 3) UefiBootManagerLib: As Ruiyu Ni mentioned, we need make allocate >>> L"PerfDataMemAddr" below 4G for when PcdDxeIplSwitchToLongMode is >>> set, because it is used by S3Resume. >> >> My counter-argument is the same as above, I think. >> >>> >>> >>> >>> 4) BootGraphicsResourceTableDxe: There is no need to add 4G >>> limitation, we can remove that directly. (Confirmed with Chao Zhang) >>> >>> 5) FirmwarePerformanceDataTableDxe: We can check >>> PcdDxeIplSwitchToLongMode to allocate Below4G memory for >>> mAcpiS3PerformanceTable which is used in S3. >> >> Same counter-argument again. >> >>> And we can remove 4G >>> limitation for mAcpiBootPerformanceTable directly. (Confirmed with >>> Chao >>> Zhang) >>> >>> 6) PiDxeS3BootScriptLib: We can check PcdDxeIplSwitchToLongMode for >>> S3TableBase and NewS3TableBase which is used in S3. (Confirmed with >>> Star >>> Zeng) >> >> Same counter-argument. >> >>> >>> 7) BootScriptExecutorDxe: We can check PcdDxeIplSwitchToLongMode for >>> FfsBuffer and BootScriptExecutorBuffer which is used in S3. >>> (Confirmed with Star Zeng) >> >> Same counter-argument. >> >>> >>> 8) CapsuleRuntimeDxe: Already check PcdDxeIplSwitchToLongMode.No >>> update needed. (Confirmed with Star Zeng) >> >> The only reason this doesn't break under OVMF is because OVMF doesn't >> implement capsule updates. >> >> In summary, PcdDxeIplSwitchToLongMode==TRUE implies, but is not >> equivalent to, the requirement to allocate under 4GB for S3 purposes. >> A platform may have *both* S3 *and* a 64-bit PEI phase without access >> to >>>= 4GB memory. >> >> That's why I proposed a brand new PCD. PcdDxeIplSwitchToLongMode >> would imply that PCD, but the new PCD wouldn't necessarily imply >> PcdDxeIplSwitchToLongMode. >> >>> BTW: We are trying not use IntelFrameworkModulePkg. If there is >>> replacement in MdeModulePkg, we suggest use the one in MdeModulePkg, >>> for example, BDS or ACPI. >> >> Absolutely, but migrating a platform from IntelFrameworkModulePkg BDS >> to MdeModulePkg BDS is not trivial, due to lack of documentation and >> examples; plus some functionality that is publicly available from >> GenericBdsLib (and is used by PlatformBdsLib instances) needs to be >> exported from the new boot manager library. >> >> Thanks >> Laszlo >> >>> >>> >>> >>> Thank you >>> >>> Yao Jiewen >>> >>> >>> >>> >>> >>> *From:*Ard Biesheuvel [mailto:[email protected]] >>> *Sent:* Monday, February 22, 2016 8:43 PM >>> *To:* Yao, Jiewen >>> *Cc:* Ni, Ruiyu; Laszlo Ersek; >>> [email protected]<mailto:[email protected]> >>> ; Tian, Feng; Zeng, Star; >>> [email protected]<mailto:[email protected]> ; >> [email protected]<mailto:[email protected]> ; Fan, >>> Jeff; Gao, Liming >>> *Subject:* Re: [edk2] [PATCH v3 2/4] IntelFrameworkModulePkg: BdsDxe: >>> only allocate below 4 GB if needed >>> >>> >>> >>> On 22 February 2016 at 13:40, Yao, Jiewen wrote: >>>> I did a search on current MdeModulePkg. I found there are more >>>> modules allocating Below4G memory. >>>> >>>> Besides BDS, we have BootGraphicsResourceTableDxe, >>>> BootScriptExecutorDxe, FirmwarePerformanceDataTableDxe, >>>> PiDxeS3BootScriptLib, CapsuleRuntimeDxe. >>>> >>>> >>>> >>>> BootGraphicsResourceTableDxe – I am not clear. >>>> >>>> CapsuleRuntimeDxe – Below4G is designed for PEI capsule. Or we do >>>> not need such capability. >>>> >>>> BootScriptExecutorDxe, FirmwarePerformanceDataTableDxe, >>>> PiDxeS3BootScriptLib >>>> - data need to be access in PEI phase. >>>> >>>> >>>> >>>> I think we might need more clean up for 32bit PEI. >>>> >>> >>> Agreed. >>> >>>> I am still thinking if we can reuse old Pcd – we can clarify in PCD >>>> description – it is only for 32bit PEI and 64bit DXE. >>>> >>>> Or a new PCD, and name to Pcd32BitPei? Please allow me collect more >>>> feedback on that. >>>> >>> >>> OK >>> >>>> In order to separate 32bit PEI issue from ACPI issue, I think we >>>> can check in ACPI patch at first, if you want. >>>> >>> >>> Should I wait for the maintainers of MdeModulePkg to give their >>> Reviewed-by ? >>> >> > > _______________________________________________ > edk2-devel mailing list > [email protected]<mailto:[email protected]> > https://lists.01.org/mailman/listinfo/edk2-devel > _______________________________________________ edk2-devel mailing list [email protected]<mailto:[email protected]> https://lists.01.org/mailman/listinfo/edk2-devel _______________________________________________ edk2-devel mailing list [email protected] https://lists.01.org/mailman/listinfo/edk2-devel

