On 10/29/15 22:13, Laszlo Ersek wrote:
> On 10/29/15 19:39, Jordan Justen wrote:
>> On 2015-10-29 04:45:37, Laszlo Ersek wrote:
>>> On 10/29/15 02:32, Jordan Justen wrote:
>>>> +    ASSERT (MaxProcessors > 0);
>>>> +    PcdSet32 (PcdCpuMaxLogicalProcessorNumber, MaxProcessors);
>>>
>>> I think that when this branch is active, then
>>> PcdCpuApInitTimeOutInMicroSeconds should *also* be set, namely to
>>> MAX_UINT32 (~71 minutes, the closest we can get to "infinity"). When
>>> this hint is available from QEMU, then we should practically disable the
>>> timeout option in CpuDxe's AP counting.
>>
>> I think this is a good idea, but I don't think 71 minutes is useful.
>> Perhaps 30 seconds? This seems more than adequate for hundreds of
>> processors to startup. Or perhaps some timeout based on the number of
>> processors?
> 
> No, my suggestion with the 71 minutes didn't aim at a "useful" timeout.
> Instead, when QEMU provides the number of VCPUs via fw_cfg, I'd like to
> take the timeout *completely* out of the picture. Wait until the
> advertised number of VCPUs come up, period. If they don't all appear,
> then hang forever. Well, at least for 71 minutes, which is the same for
> interactive users.
> 
> If 30 seconds elapse and we boot with 1 or 2 VCPUs missing, then things
> will break hard. I don't actually *expect* this to occur against a 30
> second timeout, but 30 seconds still sends the wrong message to the
> programmer and the user. It looks like a real, reasonable timeout. While
> in this case, the loop should never exit on a timeout, and 0xFFFFFFFF
> communicates that.
> 
>> Janusz and I were discussing
>> https://github.com/tianocore/edk2/issues/21 on irc. We increased the
>> timeout to 10 seconds, and with only 8 processors it was still timing
>> out.
> 
> Ugh.
> 
>> Obviously we are somehow failing to start the processors correctly, or
>> QEMU/KVM is doing something wrong.
> 
> I think the actual issue we're fighting here is described in
> <http://thread.gmane.org/gmane.comp.bios.edk2.devel/3260>. Due to the
> kernel commit named there, and due to a physical device being assigned
> to the guest, guest memory becomes uncacheable for each AP, until the AP
> clears CR0.CD. I guess... And that should slow it down extremely.
> 
>> Have you been able to reproduce this issue?
> 
> I think I have, although I didn't try. :) My current host kernel is
> based on v4.3-rc3 (upon which kvm/master is based, upon which I have a
> fix), and the commit in question (b18d5431acc7) is part of v4.2-rc1.
> 
> If you have a host kernel at least as fresh as v4.2-rc1 (and I do, see
> above), then you run into the issue automatically. For which reason I've
> been carrying my patch referenced above in my development branches --
> I've been focusing on the SMM issues, and solving (or working around)
> the MP startup problem is a prerequisite for that.
> 
> So, yes, saw it, worked around it immediately, forgot about it. :)
> 
>> It seems like we need to
>> set the timeout to 71 minutes, and then debug QEMU/KVM to see what
>> state the APs are in...
> 
> I'm a bit overloaded to tackle this right now, but...
> 
>> Unfortunately I haven't yet been able to reproduce the bug on my
>> system. :(
> 
> if you install a host kernel at least as recent as v4.2-rc1, then the
> bug should pop up at once.

Sorry, forgot about a tiny circumstance: you'll also have to assign a
physical device (like a GPU) to your guest. Now *that* is a whole story
per se. See <http://vfio.blogspot.com/>.

In fact, I'm now thinking that I might not have reproduced the issue at
all! (And perhaps I just applied the workaround patch as precaution.) My
workstation does have an assigned GPU, but the kernel on it is older
than v4.2-rc1 -- no issues there. And on my laptop (which has the recent
kernel) no devices can be reasonably assigned to guests. (It has VT-d,
and even the IOMMU groups look good, but the PCI devices are not
"discrete" enough from a guest driver perspective.)

Apologies, I can't help with this right now...

Thanks
Laszlo
_______________________________________________
edk2-devel mailing list
[email protected]
https://lists.01.org/mailman/listinfo/edk2-devel

Reply via email to