On 02/23/16 16:09, Yao, Jiewen wrote: > Ok, thanks for clarification. > > > > I just answer the last question below: > > - How does the Conclusion follow from Premises 1 through 4? What > guarantees it? > > > > - Premise 1: > OVMF installs memory that is >= 4GB as an untested memory resource > descriptor HOB. > > - Premise 2: > OVMF creates a memory type information HOB that is out of date, and will > *always* be out of date. > (It is simply impossible to keep that HOB up-to-date, or to reliably > detect when it becomes stale.) > > - Premise 3: > All of the guest RAM, including >= 4GB, is available to UEFI drivers and > applications. > > - Premise 4: > The affected drivers will be allowed to allocate memory >= 4GB if > PcdDxeIplSwitchToLongMode is FALSE. > > - Conclusion: > No memory area that is accessed during PEI on the S3 resume path is > allocated from >= 4GB memory during DXE and BDS on the normal boot path. > > [Jiewen] Premise 2 is very important. It will cause DxeCore always find > a piece of memory to pre-allocate BIN. It means DxeCore will **always** > find memory in BIN, to allocate ACPI/Reserved/RT memory. > > BDS will auto update the memory type info hob into a variable. The > proper way is to get variable and sync variable data to Hob.
I think you may have misunderstood my Premise 2. Premise 2 says that OVMF does *not* produce a usable, valid memory type information HOB. Hence it does not enable the DXE core to preallocate a bin that can satisfy all allocations made by DXE drivers, guaranteed. If I understand correctly, the idea is that the BDS collects the actual usage statistics, writes them to a non-volatile variable, and then at next boot, the PEI phase can employ the read-only variable PEIM to fetch this variable. And, create a HOB with the information written by the last boot's BDS. There are several problems with this idea for OVMF. First, if the actual allocations don't fit in the preallocated bins, then the Conclusion (i.e., <= 4GB allocations) cannot be guaranteed. Therefore the guest OS must not be launched from BDS, because S3 resume will break, and the OS will not come back up. Second, if the OS is launched for the very first time, on an empty variable store, then there is no previous measurement to base the HOB on, in PEI. This takes us back to the case where the memory type information HOB has to be initialized correctly from static code. Third, even if there is a memory type information variable in the varstore, written by previous executions of BDS, the variable store might be lost or deleted (after all it is just another file on the virtualization host). This is not supposed to render the virtual machine inoperable; I think the various recovery options and UEFI applications exist for that reason. However, as far as the memory type info HOB is concerned, this case is identical to the one above -- no previous measurement available, HOB is initialized statically from code. (I'll only mention in passing that OVMF doesn't even include VariablePei at the moment -- there's been no reason for including it.) > If Premise 1 is satisfied, DxeCore will use tested Below4G memory as > BIN, to pre-allocate ACPI/Reserved/RT memory. This is one of first steps > in DxeCore initialization. > > Premise 3 is not related, because memory test driver will **test** > untested memory. After that DxeCore **can** use that. It does not mean > DxeCore **has to** allocate above 4G. DxeCore still looks for BIN > location as first priority. > > Premise 4 is correct only on BS/Loader memory, because of TopDown > policy. For ACPI/Reserved/RT memory, DxeCore looks for BIN location as > first priorty. Because BIN is below 4G, ACPI/Reserved/RT memory is below 4G. > > Conclusion: since PEI S3 phase only access ACPI/Reserved memory and they > are below 4G, it is safe. How does the argument change if you factor in that Premise 2 actually says that OVMF's memory type information HOB is *unreliable* in reality? I'd like the argument to work without the memory type information HOB being up to date. Or, minimally, there should be a way to *notice* that the allocations couldn't be satisfied from below 4GB (implying that S3 resume will break), and in that case, do not allow BDS to launch the OS. I vaguely recall that BDS can automatically reboot the machine if it finds that the memory type information / statistics have changed (that is, the actual allocations couldn't be satisfied from the preallocated bins). Is that correct? > I completely understand the whole process is complicated, so we > describe memory type information table in > > https://firmware.intel.com/sites/default/files/resources/A_Tour_Beyond_BIOS_Memory_Map_in%20UEFI_BIOS.pdf, > “Memory Map in S4 resume” section. Two questions here: * Yes, it has been my understanding that the memory type info HOB is related to S4. But S4 has never been relevant for OVMF, so I never cared about this HOB. Is it a good idea to force all users of S3 to work with this HOB, even if they don't care about S4, only S3? * I find the description and the diagram in this section of the whitepaper helpful. Especially the statement that BDS is able to detect the case when the preallocated bins were too small, and in that case it can reset the platform. This should even cover the case when the variable store is lost or deleted, and the static defaults take effect for the HOB again. *However*, the practice with OVMF disagrees. Namely, as I mentioned earlier, the memory type information HOB that OVMF currently installs -- always from static defaults -- is *always* too small in comparison to the actual usage! If you check the stats I pasted earlier, - for Reserved memory, the actual usage can be 10 times as high as the current static preallocation; - for AcpiNVS memory (numeric type 0x0A, used in the whitepaper's example), the actual usage can be 80 times as high as the current preallocation; - for runtime data memory, the actual usage can be 25 times as high as the current preallocation. The fact that the preallocated bins are *always* too small implies that the BDS should *always* reboot the (virtual) machine -- guest OSes should be practically unbootable on OVMF. Yet this never happens; OVMF boots guest OSes just fine. This sort of decreases my trust that BDS will actually prevent an OS from being booted if the preallocated bins are too small. Are we missing something in OVMF? > Please let me know if it is clear enough. If not, I can give more > description. My general preference would be to eliminate the memory type information HOB from the equation. First, I think it was originally defined for S4 purposes, not S3. (And S4 is out of scope for OVMF.) Second, the BDS failsafe that the whitepaper describes doesn't seem to be active with OVMF in practice. I'm certainly willing to investigate this avenue further (and thank you for the education thus far), but please let's keep the PCD option on the table still. By that I don't mean *another* new PCD, beyond the ACPI version PCD that Ard's patch series already contains -- I mean a more generic PCD that *replaces* the ACPI version PCD, and controls the allocations across all drivers. Thanks Laszlo _______________________________________________ edk2-devel mailing list [email protected] https://lists.01.org/mailman/listinfo/edk2-devel

