Hi :
Sorry for the late reply.
I tested the patch series using 64G and 80G.
Both of them are OK in XEN.

Here is what it looks like inside the VM (the memory is 80G):
                             total       used       free     shared    buffers  
   cached
 Mem:       81956412     654708   81301704          0      10528      42256
 -/+ buffers/cache:      601924   81354488
 Swap:      4186108          0    4186108
 
 Thanks a lot for your nice work!
 Maoming


-----邮件原件-----
发件人: Laszlo Ersek [mailto:ler...@redhat.com] 
发送时间: 2015年6月10日 21:03
收件人: Maoming
抄送: edk2-devel@lists.sourceforge.net; Huangpeng (Peter); Wei Liu; Paolo Bonzini
主题: Re: [edk2] [RFC 4/4] OvmfPkg: PlatformPei: invert MTRR setup in 
QemuInitializeRam()

On 06/09/15 04:15, Laszlo Ersek wrote:
> On 06/08/15 23:46, Laszlo Ersek wrote:
>> At the moment we work with a UC default MTRR type, and set three 
>> memory ranges to WB:
>> - [0, 640 KB),
>> - [1 MB, LowerMemorySize),
>> - [4 GB, 4 GB + UpperMemorySize).
>>
>> Unfortunately, coverage for the third range can fail with a high 
>> likelihood. If the alignment of the base (ie. 4 GB) and the alignment 
>> of the size (UpperMemorySize) differ, then MtrrLib creates a series 
>> of variable MTRR entries, with power-of-two sized MTRR masks. And, 
>> it's really easy to run out of variable MTRR entries, dependent on 
>> the alignment difference.
>>
>> This is a problem because a Linux guest will loudly reject any high 
>> memory that is not covered my MTRR.
>>
>> So, let's follow the inverse pattern (loosely inspired by SeaBIOS):
>> - flip the MTRR default type to WB,
>> - set [0, 640 KB) to WB -- fixed MTRRs have precedence over the default
>>   type and variable MTRRs, so we can't avoid this,
>> - set [640 KB, 1 MB) to UC -- implemented with fixed MTRRs,
>> - set [LowerMemorySize, 4 GB) to UC -- should succeed with variable MTRRs
>>   more likely than the other scheme (due to less chaotic alignment
>>   differences).
>>
>> Effects of this patch can be observed by setting DEBUG_CACHE 
>> (0x00200000) in PcdDebugPrintErrorLevel.
>>
>> BUG: Although the MTRRs look good to me in the OVMF debug log, I 
>> still can't boot >= 64 GB guests with this. Instead of the complaints 
>> mentioned above, the Linux guest apparently spirals into an infinite 
>> loop (on KVM), or hangs with no CPU load (on TCG).
> 
> No, actually there is no bug in this patch (so s/RFC/PATCH/). I did 
> more testing and these are the findings:
> - I can reproduce the same issue on KVM with SeaBIOS guests.
> - The exact symptoms are that as soon as the highest guest-phys address
>   is >= 64 GB, then the guest kernel doesn't boot. It gets stuck
>   somewhere after hitting Enter in grub.
> - Normally 3 GB of the guest RAM is mapped under 4 GB in guest-phys
>   address space, then there's a 1 GB PCI hole, and the rest is above
>   4 GB. This means that a 63 GB guest can be started (because 63 - 3 + 4
>   == 64), but if you add just 1 MB more, it won't boot.
> - (This was the big discovery:) I flipped the "ept" parameter of the
>   kvm_intel module on my host to N, and then things started to work. I
>   just booted a 128 GB Linux guest with this patchset. (I have 4 GB
>   RAM in my host, plus approx 250 GB swap.) The guest could see it all.
> - The TCG boot didn't hang either; I just couldn't wait earlier for
>   network initialization to complete.
> 
> I'm CC'ing Paolo for help with the EPT question. Other than that, this 
> series is functional. (For QEMU/KVM at least; Xen will likely need 
> more fixes from others.)

We have a root cause, it seems. The issue is that the processor in my laptop, 
on which I tested, has only 36 bits for physical addresses:

  $ grep 'address sizes' /proc/cpuinfo
  address sizes   : 36 bits physical, 48 bits virtual
  ...

Which matches where the problem surfaces (64 GB guest-phys address
space) with hw-supported nested paging (EPT) enabled on the host.

In order to confirm this, a colleague of mine gave me access to a server with 
96 GB of RAM, and:

  address sizes : 46 bits physical, 48 bits virtual

On this host I booted a 72 GB OVMF guest on QEMU/KVM, with EPT enabled, and 
according to the guest dmesg, the guest saw it all.

  Memory: 74160924K/75493820K available (7735K kernel code, 1149K
  rwdata, 3340K rodata, 1500K init, 1524K bss, 1332896K reserved, 0K
  cma-reserved)

Maoming: since you reported this issue, please confirm that the patch series 
resolves it for you as well. In that case, I'll repost the series with "PATCH" 
as subject-prefix instead of "RFC", and I'll drop the BUG note from the last 
commit message.

Thanks
Laszlo

>> Cc: Maoming <maoming.maom...@huawei.com>
>> Cc: Huangpeng (Peter) <peter.huangp...@huawei.com>
>> Cc: Wei Liu <wei.l...@citrix.com>
>> Contributed-under: TianoCore Contribution Agreement 1.0
>> Signed-off-by: Laszlo Ersek <ler...@redhat.com>
>> ---
>>  OvmfPkg/PlatformPei/MemDetect.c | 43 
>> +++++++++++++++++++++++++++++++++++++----
>>  1 file changed, 39 insertions(+), 4 deletions(-)
>>
>> diff --git a/OvmfPkg/PlatformPei/MemDetect.c 
>> b/OvmfPkg/PlatformPei/MemDetect.c index 3ceb142..cceab22 100644
>> --- a/OvmfPkg/PlatformPei/MemDetect.c
>> +++ b/OvmfPkg/PlatformPei/MemDetect.c
>> @@ -194,6 +194,8 @@ QemuInitializeRam (  {
>>    UINT64                      LowerMemorySize;
>>    UINT64                      UpperMemorySize;
>> +  MTRR_SETTINGS               MtrrSettings;
>> +  EFI_STATUS                  Status;
>>  
>>    DEBUG ((EFI_D_INFO, "%a called\n", __FUNCTION__));
>>  
>> @@ -214,12 +216,45 @@ QemuInitializeRam (
>>      }
>>    }
>>  
>> -  MtrrSetMemoryAttribute (BASE_1MB, LowerMemorySize - BASE_1MB, 
>> CacheWriteBack);
>> +  //
>> +  // We'd like to keep the following ranges uncached:
>> +  // - [640 KB, 1 MB)
>> +  // - [LowerMemorySize, 4 GB)
>> +  //
>> +  // Everything else should be WB. Unfortunately, programming the inverse 
>> (ie.
>> +  // keeping the default UC, and configuring the complement set of 
>> + the above as  // WB) is not reliable in general, because the end of 
>> + the upper RAM can have  // practically any alignment, and we may 
>> + not have enough variable MTRRs to  // cover it exactly.
>> +  //
>> +  if (IsMtrrSupported ()) {
>> +    MtrrGetAllMtrrs (&MtrrSettings);
>>  
>> -  MtrrSetMemoryAttribute (0, BASE_512KB + BASE_128KB, 
>> CacheWriteBack);
>> +    //
>> +    // MTRRs disabled, fixed MTRRs disabled, default type is uncached
>> +    //
>> +    ASSERT ((MtrrSettings.MtrrDefType & BIT11) == 0);
>> +    ASSERT ((MtrrSettings.MtrrDefType & BIT10) == 0);
>> +    ASSERT ((MtrrSettings.MtrrDefType & 0xFF) == 0);
>>  
>> -  if (UpperMemorySize != 0) {
>> -    MtrrSetMemoryAttribute (BASE_4GB, UpperMemorySize, CacheWriteBack);
>> +    //
>> +    // flip default type to writeback
>> +    //
>> +    SetMem (&MtrrSettings.Fixed, sizeof MtrrSettings.Fixed, 0x06);
>> +    ZeroMem (&MtrrSettings.Variables, sizeof MtrrSettings.Variables);
>> +    MtrrSettings.MtrrDefType |= BIT11 | BIT10 | 6;
>> +    MtrrSetAllMtrrs (&MtrrSettings);
>> +
>> +    //
>> +    // punch holes
>> +    //
>> +    Status = MtrrSetMemoryAttribute (BASE_512KB + BASE_128KB,
>> +               SIZE_256KB + SIZE_128KB, CacheUncacheable);
>> +    ASSERT_EFI_ERROR (Status);
>> +
>> +    Status = MtrrSetMemoryAttribute (LowerMemorySize,
>> +               SIZE_4GB - LowerMemorySize, CacheUncacheable);
>> +    ASSERT_EFI_ERROR (Status);
>>    }
>>  }
>>  
>>
> 

------------------------------------------------------------------------------
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/edk2-devel

Reply via email to