On 06/16/15 14:54, Maoming wrote:
> 
> 
> -----邮件原件-----
> 发件人: Laszlo Ersek [mailto:ler...@redhat.com] 
> 发送时间: 2015年6月15日 22:08
> 收件人: Maoming
> 抄送: edk2-devel@lists.sourceforge.net; Huangpeng (Peter); Wei Liu; Paolo 
> Bonzini
> 主题: Re: 答复: [edk2] [RFC 4/4] OvmfPkg: PlatformPei: invert MTRR setup in 
> QemuInitializeRam()
> 
> On 06/15/15 15:25, Maoming wrote:
>> Hi :
>> Sorry for the late reply.
>> I tested the patch series using 64G and 80G.
>> Both of them are OK in XEN.
>>
>> Here is what it looks like inside the VM (the memory is 80G):
>>                           total       used       free     shared    buffers  
>>    cached
>>  Mem:       81956412     654708   81301704          0      10528      42256
>>  -/+ buffers/cache:      601924   81354488
>>  Swap:      4186108          0    4186108
>>  
>>  Thanks a lot for your nice work!
>>  Maoming
> 
> Thanks for reporting back!
> 
> Since you mentioned earlier that you encountered the problem on qemu/KVM
> too -- can you please give that a whirl as well, with this patch series
> in place?
> 
> Thank you
> Laszlo
> 
> 
>  The patch series works well in KVM too.
>  My environment is :
>  version:        kvm-kmod-3.6
>  QEMU emulator version 2.1.0
>  
>  Here is what it looks like inside the VM (the memory is 90G):
>                            total       used       free     shared    buffers  
>    cached
>  Mem:       92862616    1155156   91707460          0      13552      77952
>  -/+ buffers/cache:         1063652   91798964
>  Swap:        4063224          0    4063224
> 
> Thanks!
> Maoming

Great, thank you.

I'll add Wei Liu's Tested-by to patches #1 and #2 (because the other two
patches don't affect Xen), and I will add your Tested-by to all four
patches. I'll update the commit message of patch #4 and I'll resend the
series as PATCH, not RFC.

Cheers!
Laszlo


> 
> 
>> -----邮件原件-----
>> 发件人: Laszlo Ersek [mailto:ler...@redhat.com] 
>> 发送时间: 2015年6月10日 21:03
>> 收件人: Maoming
>> 抄送: edk2-devel@lists.sourceforge.net; Huangpeng (Peter); Wei Liu; Paolo 
>> Bonzini
>> 主题: Re: [edk2] [RFC 4/4] OvmfPkg: PlatformPei: invert MTRR setup in 
>> QemuInitializeRam()
>>
>> On 06/09/15 04:15, Laszlo Ersek wrote:
>>> On 06/08/15 23:46, Laszlo Ersek wrote:
>>>> At the moment we work with a UC default MTRR type, and set three 
>>>> memory ranges to WB:
>>>> - [0, 640 KB),
>>>> - [1 MB, LowerMemorySize),
>>>> - [4 GB, 4 GB + UpperMemorySize).
>>>>
>>>> Unfortunately, coverage for the third range can fail with a high 
>>>> likelihood. If the alignment of the base (ie. 4 GB) and the alignment 
>>>> of the size (UpperMemorySize) differ, then MtrrLib creates a series 
>>>> of variable MTRR entries, with power-of-two sized MTRR masks. And, 
>>>> it's really easy to run out of variable MTRR entries, dependent on 
>>>> the alignment difference.
>>>>
>>>> This is a problem because a Linux guest will loudly reject any high 
>>>> memory that is not covered my MTRR.
>>>>
>>>> So, let's follow the inverse pattern (loosely inspired by SeaBIOS):
>>>> - flip the MTRR default type to WB,
>>>> - set [0, 640 KB) to WB -- fixed MTRRs have precedence over the default
>>>>   type and variable MTRRs, so we can't avoid this,
>>>> - set [640 KB, 1 MB) to UC -- implemented with fixed MTRRs,
>>>> - set [LowerMemorySize, 4 GB) to UC -- should succeed with variable MTRRs
>>>>   more likely than the other scheme (due to less chaotic alignment
>>>>   differences).
>>>>
>>>> Effects of this patch can be observed by setting DEBUG_CACHE 
>>>> (0x00200000) in PcdDebugPrintErrorLevel.
>>>>
>>>> BUG: Although the MTRRs look good to me in the OVMF debug log, I 
>>>> still can't boot >= 64 GB guests with this. Instead of the complaints 
>>>> mentioned above, the Linux guest apparently spirals into an infinite 
>>>> loop (on KVM), or hangs with no CPU load (on TCG).
>>>
>>> No, actually there is no bug in this patch (so s/RFC/PATCH/). I did 
>>> more testing and these are the findings:
>>> - I can reproduce the same issue on KVM with SeaBIOS guests.
>>> - The exact symptoms are that as soon as the highest guest-phys address
>>>   is >= 64 GB, then the guest kernel doesn't boot. It gets stuck
>>>   somewhere after hitting Enter in grub.
>>> - Normally 3 GB of the guest RAM is mapped under 4 GB in guest-phys
>>>   address space, then there's a 1 GB PCI hole, and the rest is above
>>>   4 GB. This means that a 63 GB guest can be started (because 63 - 3 + 4
>>>   == 64), but if you add just 1 MB more, it won't boot.
>>> - (This was the big discovery:) I flipped the "ept" parameter of the
>>>   kvm_intel module on my host to N, and then things started to work. I
>>>   just booted a 128 GB Linux guest with this patchset. (I have 4 GB
>>>   RAM in my host, plus approx 250 GB swap.) The guest could see it all.
>>> - The TCG boot didn't hang either; I just couldn't wait earlier for
>>>   network initialization to complete.
>>>
>>> I'm CC'ing Paolo for help with the EPT question. Other than that, this 
>>> series is functional. (For QEMU/KVM at least; Xen will likely need 
>>> more fixes from others.)
>>
>> We have a root cause, it seems. The issue is that the processor in my 
>> laptop, on which I tested, has only 36 bits for physical addresses:
>>
>>   $ grep 'address sizes' /proc/cpuinfo
>>   address sizes   : 36 bits physical, 48 bits virtual
>>   ...
>>
>> Which matches where the problem surfaces (64 GB guest-phys address
>> space) with hw-supported nested paging (EPT) enabled on the host.
>>
>> In order to confirm this, a colleague of mine gave me access to a server 
>> with 96 GB of RAM, and:
>>
>>   address sizes      : 46 bits physical, 48 bits virtual
>>
>> On this host I booted a 72 GB OVMF guest on QEMU/KVM, with EPT enabled, and 
>> according to the guest dmesg, the guest saw it all.
>>
>>   Memory: 74160924K/75493820K available (7735K kernel code, 1149K
>>   rwdata, 3340K rodata, 1500K init, 1524K bss, 1332896K reserved, 0K
>>   cma-reserved)
>>
>> Maoming: since you reported this issue, please confirm that the patch series 
>> resolves it for you as well. In that case, I'll repost the series with 
>> "PATCH" as subject-prefix instead of "RFC", and I'll drop the BUG note from 
>> the last commit message.
>>
>> Thanks
>> Laszlo
>>
>>>> Cc: Maoming <maoming.maom...@huawei.com>
>>>> Cc: Huangpeng (Peter) <peter.huangp...@huawei.com>
>>>> Cc: Wei Liu <wei.l...@citrix.com>
>>>> Contributed-under: TianoCore Contribution Agreement 1.0
>>>> Signed-off-by: Laszlo Ersek <ler...@redhat.com>
>>>> ---
>>>>  OvmfPkg/PlatformPei/MemDetect.c | 43 
>>>> +++++++++++++++++++++++++++++++++++++----
>>>>  1 file changed, 39 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/OvmfPkg/PlatformPei/MemDetect.c 
>>>> b/OvmfPkg/PlatformPei/MemDetect.c index 3ceb142..cceab22 100644
>>>> --- a/OvmfPkg/PlatformPei/MemDetect.c
>>>> +++ b/OvmfPkg/PlatformPei/MemDetect.c
>>>> @@ -194,6 +194,8 @@ QemuInitializeRam (  {
>>>>    UINT64                      LowerMemorySize;
>>>>    UINT64                      UpperMemorySize;
>>>> +  MTRR_SETTINGS               MtrrSettings;
>>>> +  EFI_STATUS                  Status;
>>>>  
>>>>    DEBUG ((EFI_D_INFO, "%a called\n", __FUNCTION__));
>>>>  
>>>> @@ -214,12 +216,45 @@ QemuInitializeRam (
>>>>      }
>>>>    }
>>>>  
>>>> -  MtrrSetMemoryAttribute (BASE_1MB, LowerMemorySize - BASE_1MB, 
>>>> CacheWriteBack);
>>>> +  //
>>>> +  // We'd like to keep the following ranges uncached:
>>>> +  // - [640 KB, 1 MB)
>>>> +  // - [LowerMemorySize, 4 GB)
>>>> +  //
>>>> +  // Everything else should be WB. Unfortunately, programming the inverse 
>>>> (ie.
>>>> +  // keeping the default UC, and configuring the complement set of 
>>>> + the above as  // WB) is not reliable in general, because the end of 
>>>> + the upper RAM can have  // practically any alignment, and we may 
>>>> + not have enough variable MTRRs to  // cover it exactly.
>>>> +  //
>>>> +  if (IsMtrrSupported ()) {
>>>> +    MtrrGetAllMtrrs (&MtrrSettings);
>>>>  
>>>> -  MtrrSetMemoryAttribute (0, BASE_512KB + BASE_128KB, 
>>>> CacheWriteBack);
>>>> +    //
>>>> +    // MTRRs disabled, fixed MTRRs disabled, default type is uncached
>>>> +    //
>>>> +    ASSERT ((MtrrSettings.MtrrDefType & BIT11) == 0);
>>>> +    ASSERT ((MtrrSettings.MtrrDefType & BIT10) == 0);
>>>> +    ASSERT ((MtrrSettings.MtrrDefType & 0xFF) == 0);
>>>>  
>>>> -  if (UpperMemorySize != 0) {
>>>> -    MtrrSetMemoryAttribute (BASE_4GB, UpperMemorySize, CacheWriteBack);
>>>> +    //
>>>> +    // flip default type to writeback
>>>> +    //
>>>> +    SetMem (&MtrrSettings.Fixed, sizeof MtrrSettings.Fixed, 0x06);
>>>> +    ZeroMem (&MtrrSettings.Variables, sizeof MtrrSettings.Variables);
>>>> +    MtrrSettings.MtrrDefType |= BIT11 | BIT10 | 6;
>>>> +    MtrrSetAllMtrrs (&MtrrSettings);
>>>> +
>>>> +    //
>>>> +    // punch holes
>>>> +    //
>>>> +    Status = MtrrSetMemoryAttribute (BASE_512KB + BASE_128KB,
>>>> +               SIZE_256KB + SIZE_128KB, CacheUncacheable);
>>>> +    ASSERT_EFI_ERROR (Status);
>>>> +
>>>> +    Status = MtrrSetMemoryAttribute (LowerMemorySize,
>>>> +               SIZE_4GB - LowerMemorySize, CacheUncacheable);
>>>> +    ASSERT_EFI_ERROR (Status);
>>>>    }
>>>>  }
>>>>  
>>>>
>>>
>>
> 


------------------------------------------------------------------------------
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/edk2-devel

Reply via email to