On 10/30/15 14:39, Laszlo Ersek wrote:
> On 10/30/15 14:04, Janusz Mocek wrote:
>> W dniu 30.10.2015 o 13:26, Laszlo Ersek pisze:
>>> CC'ing Xiao and Alex again.
>>>
>>> On 10/29/15 19:39, Jordan Justen wrote:
>>>> On 2015-10-29 04:45:37, Laszlo Ersek wrote:
>>>>> On 10/29/15 02:32, Jordan Justen wrote:
>>>>>> +    ASSERT (MaxProcessors > 0);
>>>>>> +    PcdSet32 (PcdCpuMaxLogicalProcessorNumber, MaxProcessors);
>>>>> I think that when this branch is active, then
>>>>> PcdCpuApInitTimeOutInMicroSeconds should *also* be set, namely to
>>>>> MAX_UINT32 (~71 minutes, the closest we can get to "infinity"). When
>>>>> this hint is available from QEMU, then we should practically disable
>>>>> the timeout option in CpuDxe's AP counting.
>>>> I think this is a good idea, but I don't think 71 minutes is useful.
>>>> Perhaps 30 seconds? This seems more than adequate for hundreds of
>>>> processors to startup. Or perhaps some timeout based on the number of
>>>> processors?
>>>>
>>>> Janusz and I were discussing
>>>> https://github.com/tianocore/edk2/issues/21 on irc. We increased the
>>>> timeout to 10 seconds, and with only 8 processors it was still timing
>>>> out.
>>>>
>>>> Obviously we are somehow failing to start the processors correctly, or
>>>> QEMU/KVM is doing something wrong.
>>>>
>>>> Have you been able to reproduce this issue? It seems like we need to
>>>> set the timeout to 71 minutes, and then debug QEMU/KVM to see what
>>>> state the APs are in...
>>>>
>>>> Unfortunately I haven't yet been able to reproduce the bug on my
>>>> system. :(
>>> I've been staring at the following things for a few tens of minutes now:
>>>
>>> (1) Kernel commit b18d5431acc7. Note that the commit changes the return
>>>     value of the vmx_get_mt_mask() function *exactly* in the following
>>>     case:
>>>
>>>       kvm_arch_has_noncoherent_dma(vcpu->kvm) &&
>>>       (kvm_read_cr0(vcpu) & X86_CR0_CD)
>>>
>>>     The first sub-condition is satisfied by GPU passthrough / device
>>>     assignment, I think; the second part depends on the VCPU having
>>>     turned on (or having *left* on) CR0.CD.
>>>
>>> (2) Consult the vmx_vcpu_reset() function in "arch/x86/kvm/vmx.c"
>>>     (current upstream). You will find:
>>>
>>>     cr0 = X86_CR0_NW | X86_CR0_CD | X86_CR0_ET;
>>>     vmx_set_cr0(vcpu, cr0); /* enter rmode */
>>>
>>>     Meaning a VCPU will start with CD and NW set, in real mode, after
>>>     re-set.
>>>
>>>     This setting dates back to the birth of KVM:
>>>
>>>       commit 6aa8b732ca01c3d7a54e93f4d701b8aabbe60fb7
>>>       Author: Avi Kivity <[email protected]>
>>>       Date: Sun Dec 10 02:21:36 2006 -0800
>>>
>>>           [PATCH] kvm: userspace interface
>>>
>>>     Search that commit for "0x60000010" (the second hit, although the
>>>     comment that contains the first hit is quite telling as well).
>>>
>>> (3) Consult the Intel SDM, Table 11-5. "Cache Operating Modes".
>>>
>>>     The (CD, NW) == (1, 1) setting in CR0 is documented as:
>>>     - "Memory coherency is not maintained."
>>>     - "(P6 family and Pentium processors.) State of the processor after
>>>       a power up or reset. "
>>>     - [in footnote 2] "The Pentium 4 and more recent processor families
>>>       do not support this mode; setting the CD and NW bits to 1 selects
>>>       the no-fill cache mode."
>>>
>>>     In other words, the settings implemented by vmx_vcpu_reset()
>>>     actually invoke the behavior of the "no-fill cache mode" (which is
>>>     (CD, NW) == (1, 0)) for all practical purposes.
>>>
>>> (4) Same reference.
>>>
>>>     The (CD, NW) == (1, 0) setting in CR0 is documented as:
>>>     - "No-fill Cache Mode. Memory coherency is maintained."
>>>     - "(Pentium 4 and later processor families.) State of processor
>>>       after a power up or reset. "
>>>
>>> (5) The AsmEnableCache() function in
>>>     "MdePkg/Library/BaseLib/Ia32/EnableCache.c". It clears both CD and
>>>     NW in CR0.
>>>
>>> (6) This setting ((CD, NW) == (0, 0))is documented in the Intel SDM as:
>>>     - "Normal Cache Mode. Highest performance cache operation."
>>>
>>> (7) The AsmEnableCache() function is invoked by MtrrLib
>>>     [UefiCpuPkg/Library/MtrrLib/MtrrLib.c] after any and all MTRR
>>>     changes. Consider:
>>>
>>>     PostMtrrChange() | MtrrSetAllMtrrs()
>>>       PostMtrrChangeEnableCache()
>>>         AsmEnableCache()
>>>
>>>     Where MtrrSetAllMtrrs() is a public function of the library; plus
>>>     PostMtrrChange() is invoked by all of the following public
>>>     functions:
>>>
>>>     - MtrrSetMemoryAttribute()
>>>     - MtrrSetVariableMtrr()
>>>     - MtrrSetFixedMtrr()
>>>
>>> (8) Because we call MtrrLib in PlatformPei first, there are two
>>>     consequences:
>>>
>>>     (a) The boot VCPU has CR0.CD *set* in all parts of OVMF that run
>>>         earlier than that.
>>>
>>>         This caused a widely reported boot perf regression in SEC (the
>>>         LZMA decompression). Ultimately another MTRR change in KVM was
>>>         reverted, so (as far as I know) this symptom has not been seen
>>>         recently. (In any case, we should probably fix this sometime...)
>>>
>>>     (b) The other consequence is that the boot VCPU's CR0.CD is clear in
>>>         the rest of OVMF. Which is what makes its speed acceptable, I
>>>         guess (as long as no APs are started up).
>>>
>>> (9) Our AP startup code massages CR0, but only for mode switches. CR0.CD
>>>     and CR0.NW are never touched.
>>>
>>>     Now, I guess this could be easily added to the assembly encoded as a
>>>     C array ("mStartupCodeTemplate" in "UefiCpuPkg/CpuDxe/ApStartup.c")
>>>     -- when cr0 is massaged anyway, just clear bits 29 and 30 too; same
>>>     as in AsmEnableCache().
>>>
>>>     However, for testing the idea, perhaps the following one-liner
>>>     suffices too -- this is the earliest an AP executes C code:
>>>
>>>> diff --git a/UefiCpuPkg/CpuDxe/CpuMp.c b/UefiCpuPkg/CpuDxe/CpuMp.c
>>>> index 3f56faa..e7f5b41 100644
>>>> --- a/UefiCpuPkg/CpuDxe/CpuMp.c
>>>> +++ b/UefiCpuPkg/CpuDxe/CpuMp.c
>>>> @@ -1451,6 +1451,8 @@ ApEntryPointInC (
>>>>    VOID*           TopOfApStack;
>>>>    UINTN           ProcessorNumber;
>>>>
>>>> +  AsmEnableCache ();
>>>> +
>>>>    if (!mAPsAlreadyInitFinished) {
>>>>      FillInProcessorInformation (FALSE, mMpSystemData.NumberOfProcessors);
>>>>      TopOfApStack  = (UINT8*)mApStackStart + gApStackSize;
>>>     This should clear CR0.CD, and "undo" kernel commit b18d5431acc7 for
>>>     the AP (by falsifying the second subcondition seen in (1)).
>>>
>>> Janusz, can you please test this one-liner (with no other out-of-tree
>>> patch applied)?
>>>
>> tested, didn't solved problem with detected cpu's
> 
> Thanks for testing it. I'll try to reproduce the problem on my
> workstation next week.

* After reviewing the PCI devices and the IOMMU groups on my laptop, I 
successfully assigned the following device to a guest:

02:00.0 SD Host controller: O2 Micro, Inc. SD/MMC Card Reader Controller (rev 
01) (prog-if 01)
        Subsystem: Lenovo Device 2211
        Flags: bus master, fast devsel, latency 0, IRQ 16
        Memory at b3901000 (32-bit, non-prefetchable) [size=4K]
        Memory at b3900000 (32-bit, non-prefetchable) [size=2K]
        Capabilities: [6c] Power Management version 3
        Capabilities: [48] MSI: Enable- Count=1/1 Maskable+ 64bit+
        Capabilities: [80] Express Endpoint, MSI 00
        Capabilities: [100] Virtual Channel
        Capabilities: [200] Advanced Error Reporting
        Capabilities: [230] Latency Tolerance Reporting
        Kernel driver in use: vfio-pci

(With -n:
02:00.0 0805: 1217:8520 (rev 01) (prog-if 01)
        Subsystem: 17aa:2211
)

* It is the sole device in its IOMMU group (13):

/sys/kernel/iommu_groups/0/devices/0000:00:00.0
/sys/kernel/iommu_groups/1/devices/0000:00:01.0
/sys/kernel/iommu_groups/1/devices/0000:01:00.0
/sys/kernel/iommu_groups/2/devices/0000:00:02.0
/sys/kernel/iommu_groups/3/devices/0000:00:03.0
/sys/kernel/iommu_groups/4/devices/0000:00:16.0
/sys/kernel/iommu_groups/5/devices/0000:00:19.0
/sys/kernel/iommu_groups/6/devices/0000:00:1a.0
/sys/kernel/iommu_groups/7/devices/0000:00:1b.0
/sys/kernel/iommu_groups/8/devices/0000:00:1c.0
/sys/kernel/iommu_groups/9/devices/0000:00:1c.1
/sys/kernel/iommu_groups/10/devices/0000:00:1c.4
/sys/kernel/iommu_groups/11/devices/0000:00:1d.0
/sys/kernel/iommu_groups/12/devices/0000:00:1f.0
/sys/kernel/iommu_groups/12/devices/0000:00:1f.2
/sys/kernel/iommu_groups/12/devices/0000:00:1f.3
/sys/kernel/iommu_groups/13/devices/0000:02:00.0
/sys/kernel/iommu_groups/14/devices/0000:03:00.0

* It is also not affected by any RMRR, according to the host dmesg.

* For completeness, the output of "lspci -tv":

-[0000:00]-+-00.0  Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor 
DRAM Controller
           +-01.0-[01]----00.0  NVIDIA Corporation GK107GLM [Quadro K1100M]
           +-02.0  Intel Corporation 4th Gen Core Processor Integrated Graphics 
Controller
           +-03.0  Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor HD 
Audio Controller
           +-16.0  Intel Corporation 8 Series/C220 Series Chipset Family MEI 
Controller #1
           +-19.0  Intel Corporation Ethernet Connection I217-LM
           +-1a.0  Intel Corporation 8 Series/C220 Series Chipset Family USB 
EHCI #2
           +-1b.0  Intel Corporation 8 Series/C220 Series Chipset High 
Definition Audio Controller
           +-1c.0-[02]----00.0  O2 Micro, Inc. SD/MMC Card Reader Controller
           +-1c.1-[03]----00.0  Intel Corporation Wireless 7260
           +-1c.4-[06-3f]--
           +-1d.0  Intel Corporation 8 Series/C220 Series Chipset Family USB 
EHCI #1
           +-1f.0  Intel Corporation QM87 Express LPC Controller
           +-1f.2  Intel Corporation 8 Series/C220 Series Chipset Family 6-port 
SATA Controller 1 [AHCI mode]
           \-1f.3  Intel Corporation 8 Series/C220 Series Chipset Family SMBus 
Controller

* In the UEFI shell, the "PCI" command confirms that the firmware enumerates 
the device fine:

Shell> pci
   Seg  Bus  Dev  Func
   ---  ---  ---  ----
...
    00   00   0A    00 ==> Base System Peripherals - SD Host controller
             Vendor 1217 Device 8520 Prog Interface 1

(The qemu command line parameter is:
-device vfio-pci,host=02:00.0,id=hostdev0,bus=pci.0,addr=0xa
)

* From the enumeration log itself:

PciBus: Discovered PCI @ [00|0A|00]
   BAR[0]: Type =  Mem32; Alignment = 0xFFF;    Length = 0x1000;        Offset 
= 0x10
   BAR[1]: Type =  Mem32; Alignment = 0xFFF;    Length = 0x800; Offset = 0x14

...
PciBus: Resource Map for Root Bridge PciRoot(0x0)
...
Type =  Mem32; Base = 0x80000000;       Length = 0x1100000;     Alignment = 
0xFFFFFF
...
   Base = 0x81000000;   Length = 0x800; Alignment = 0xFFF;      Owner = PCI 
[00|0A|00:14]
   Base = 0x81001000;   Length = 0x1000;        Alignment = 0xFFF;      Owner = 
PCI [00|0A|00:10]

* The VCPU topology for the guest is sockets=1, cores=4, threads=2, (total 8 
logical processors). All of them are detected:

Detect CPU count: 8

* I'm not seeing any delays or errors. Some details about my config:

QEMU: upstream at bc79082e4cd12c1241fa03b0abceacf45f537740

Kernel: kvm/master at ad355e383d826e3506c3caaa0fe991fd112de47b
(with git-describe: v4.3-rc3-20-gad355e3)

edk2: SVN r18690 / git d26a7a3fa251e1c2e93bdb834207643eabb847de
      (none of the recent experimental patches are applied)

Host: Lenovo ThinkPad W541
CPU: Intel(R) Core(TM) i7-4810MQ CPU @ 2.80GHz
     (family 6, model 60, stepping 3, microcode 0x1c)
     topology matches the above VCPU topology: 1*4*2

I'm very sorry, but I don't think I can spend time on this, unless someone 
gives me ssh and/or console access to a host that readily reproduces the bug, 
with the latest kvm/master, qemu, and ekd2 builds.

I have hard experimental evidence that direct access is the only way to analyze 
such bugs. For example, a few years ago I struggled with a nasty bug related to 
ixgbevf passthrough on Xen, for *months*, on and off. Once the reporter gave me 
ssh access to the box, the bug went down in *one day*.

https://bugzilla.redhat.com/show_bug.cgi?id=862862#c85
...
https://bugzilla.redhat.com/show_bug.cgi?id=862862#c116

Thanks
Laszlo
_______________________________________________
edk2-devel mailing list
[email protected]
https://lists.01.org/mailman/listinfo/edk2-devel

Reply via email to