On 06/15/15 15:25, Maoming wrote: > Hi : > Sorry for the late reply. > I tested the patch series using 64G and 80G. > Both of them are OK in XEN. > > Here is what it looks like inside the VM (the memory is 80G): > total used free shared buffers > cached > Mem: 81956412 654708 81301704 0 10528 42256 > -/+ buffers/cache: 601924 81354488 > Swap: 4186108 0 4186108 > > Thanks a lot for your nice work! > Maoming
Thanks for reporting back! Since you mentioned earlier that you encountered the problem on qemu/KVM too -- can you please give that a whirl as well, with this patch series in place? Thank you Laszlo > -----邮件原件----- > 发件人: Laszlo Ersek [mailto:ler...@redhat.com] > 发送时间: 2015年6月10日 21:03 > 收件人: Maoming > 抄送: edk2-devel@lists.sourceforge.net; Huangpeng (Peter); Wei Liu; Paolo > Bonzini > 主题: Re: [edk2] [RFC 4/4] OvmfPkg: PlatformPei: invert MTRR setup in > QemuInitializeRam() > > On 06/09/15 04:15, Laszlo Ersek wrote: >> On 06/08/15 23:46, Laszlo Ersek wrote: >>> At the moment we work with a UC default MTRR type, and set three >>> memory ranges to WB: >>> - [0, 640 KB), >>> - [1 MB, LowerMemorySize), >>> - [4 GB, 4 GB + UpperMemorySize). >>> >>> Unfortunately, coverage for the third range can fail with a high >>> likelihood. If the alignment of the base (ie. 4 GB) and the alignment >>> of the size (UpperMemorySize) differ, then MtrrLib creates a series >>> of variable MTRR entries, with power-of-two sized MTRR masks. And, >>> it's really easy to run out of variable MTRR entries, dependent on >>> the alignment difference. >>> >>> This is a problem because a Linux guest will loudly reject any high >>> memory that is not covered my MTRR. >>> >>> So, let's follow the inverse pattern (loosely inspired by SeaBIOS): >>> - flip the MTRR default type to WB, >>> - set [0, 640 KB) to WB -- fixed MTRRs have precedence over the default >>> type and variable MTRRs, so we can't avoid this, >>> - set [640 KB, 1 MB) to UC -- implemented with fixed MTRRs, >>> - set [LowerMemorySize, 4 GB) to UC -- should succeed with variable MTRRs >>> more likely than the other scheme (due to less chaotic alignment >>> differences). >>> >>> Effects of this patch can be observed by setting DEBUG_CACHE >>> (0x00200000) in PcdDebugPrintErrorLevel. >>> >>> BUG: Although the MTRRs look good to me in the OVMF debug log, I >>> still can't boot >= 64 GB guests with this. Instead of the complaints >>> mentioned above, the Linux guest apparently spirals into an infinite >>> loop (on KVM), or hangs with no CPU load (on TCG). >> >> No, actually there is no bug in this patch (so s/RFC/PATCH/). I did >> more testing and these are the findings: >> - I can reproduce the same issue on KVM with SeaBIOS guests. >> - The exact symptoms are that as soon as the highest guest-phys address >> is >= 64 GB, then the guest kernel doesn't boot. It gets stuck >> somewhere after hitting Enter in grub. >> - Normally 3 GB of the guest RAM is mapped under 4 GB in guest-phys >> address space, then there's a 1 GB PCI hole, and the rest is above >> 4 GB. This means that a 63 GB guest can be started (because 63 - 3 + 4 >> == 64), but if you add just 1 MB more, it won't boot. >> - (This was the big discovery:) I flipped the "ept" parameter of the >> kvm_intel module on my host to N, and then things started to work. I >> just booted a 128 GB Linux guest with this patchset. (I have 4 GB >> RAM in my host, plus approx 250 GB swap.) The guest could see it all. >> - The TCG boot didn't hang either; I just couldn't wait earlier for >> network initialization to complete. >> >> I'm CC'ing Paolo for help with the EPT question. Other than that, this >> series is functional. (For QEMU/KVM at least; Xen will likely need >> more fixes from others.) > > We have a root cause, it seems. The issue is that the processor in my laptop, > on which I tested, has only 36 bits for physical addresses: > > $ grep 'address sizes' /proc/cpuinfo > address sizes : 36 bits physical, 48 bits virtual > ... > > Which matches where the problem surfaces (64 GB guest-phys address > space) with hw-supported nested paging (EPT) enabled on the host. > > In order to confirm this, a colleague of mine gave me access to a server with > 96 GB of RAM, and: > > address sizes : 46 bits physical, 48 bits virtual > > On this host I booted a 72 GB OVMF guest on QEMU/KVM, with EPT enabled, and > according to the guest dmesg, the guest saw it all. > > Memory: 74160924K/75493820K available (7735K kernel code, 1149K > rwdata, 3340K rodata, 1500K init, 1524K bss, 1332896K reserved, 0K > cma-reserved) > > Maoming: since you reported this issue, please confirm that the patch series > resolves it for you as well. In that case, I'll repost the series with > "PATCH" as subject-prefix instead of "RFC", and I'll drop the BUG note from > the last commit message. > > Thanks > Laszlo > >>> Cc: Maoming <maoming.maom...@huawei.com> >>> Cc: Huangpeng (Peter) <peter.huangp...@huawei.com> >>> Cc: Wei Liu <wei.l...@citrix.com> >>> Contributed-under: TianoCore Contribution Agreement 1.0 >>> Signed-off-by: Laszlo Ersek <ler...@redhat.com> >>> --- >>> OvmfPkg/PlatformPei/MemDetect.c | 43 >>> +++++++++++++++++++++++++++++++++++++---- >>> 1 file changed, 39 insertions(+), 4 deletions(-) >>> >>> diff --git a/OvmfPkg/PlatformPei/MemDetect.c >>> b/OvmfPkg/PlatformPei/MemDetect.c index 3ceb142..cceab22 100644 >>> --- a/OvmfPkg/PlatformPei/MemDetect.c >>> +++ b/OvmfPkg/PlatformPei/MemDetect.c >>> @@ -194,6 +194,8 @@ QemuInitializeRam ( { >>> UINT64 LowerMemorySize; >>> UINT64 UpperMemorySize; >>> + MTRR_SETTINGS MtrrSettings; >>> + EFI_STATUS Status; >>> >>> DEBUG ((EFI_D_INFO, "%a called\n", __FUNCTION__)); >>> >>> @@ -214,12 +216,45 @@ QemuInitializeRam ( >>> } >>> } >>> >>> - MtrrSetMemoryAttribute (BASE_1MB, LowerMemorySize - BASE_1MB, >>> CacheWriteBack); >>> + // >>> + // We'd like to keep the following ranges uncached: >>> + // - [640 KB, 1 MB) >>> + // - [LowerMemorySize, 4 GB) >>> + // >>> + // Everything else should be WB. Unfortunately, programming the inverse >>> (ie. >>> + // keeping the default UC, and configuring the complement set of >>> + the above as // WB) is not reliable in general, because the end of >>> + the upper RAM can have // practically any alignment, and we may >>> + not have enough variable MTRRs to // cover it exactly. >>> + // >>> + if (IsMtrrSupported ()) { >>> + MtrrGetAllMtrrs (&MtrrSettings); >>> >>> - MtrrSetMemoryAttribute (0, BASE_512KB + BASE_128KB, >>> CacheWriteBack); >>> + // >>> + // MTRRs disabled, fixed MTRRs disabled, default type is uncached >>> + // >>> + ASSERT ((MtrrSettings.MtrrDefType & BIT11) == 0); >>> + ASSERT ((MtrrSettings.MtrrDefType & BIT10) == 0); >>> + ASSERT ((MtrrSettings.MtrrDefType & 0xFF) == 0); >>> >>> - if (UpperMemorySize != 0) { >>> - MtrrSetMemoryAttribute (BASE_4GB, UpperMemorySize, CacheWriteBack); >>> + // >>> + // flip default type to writeback >>> + // >>> + SetMem (&MtrrSettings.Fixed, sizeof MtrrSettings.Fixed, 0x06); >>> + ZeroMem (&MtrrSettings.Variables, sizeof MtrrSettings.Variables); >>> + MtrrSettings.MtrrDefType |= BIT11 | BIT10 | 6; >>> + MtrrSetAllMtrrs (&MtrrSettings); >>> + >>> + // >>> + // punch holes >>> + // >>> + Status = MtrrSetMemoryAttribute (BASE_512KB + BASE_128KB, >>> + SIZE_256KB + SIZE_128KB, CacheUncacheable); >>> + ASSERT_EFI_ERROR (Status); >>> + >>> + Status = MtrrSetMemoryAttribute (LowerMemorySize, >>> + SIZE_4GB - LowerMemorySize, CacheUncacheable); >>> + ASSERT_EFI_ERROR (Status); >>> } >>> } >>> >>> >> > ------------------------------------------------------------------------------ _______________________________________________ edk2-devel mailing list edk2-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/edk2-devel