On 10/29/15 22:13, Laszlo Ersek wrote: > On 10/29/15 19:39, Jordan Justen wrote: >> On 2015-10-29 04:45:37, Laszlo Ersek wrote: >>> On 10/29/15 02:32, Jordan Justen wrote: >>>> + ASSERT (MaxProcessors > 0); >>>> + PcdSet32 (PcdCpuMaxLogicalProcessorNumber, MaxProcessors); >>> >>> I think that when this branch is active, then >>> PcdCpuApInitTimeOutInMicroSeconds should *also* be set, namely to >>> MAX_UINT32 (~71 minutes, the closest we can get to "infinity"). When >>> this hint is available from QEMU, then we should practically disable the >>> timeout option in CpuDxe's AP counting. >> >> I think this is a good idea, but I don't think 71 minutes is useful. >> Perhaps 30 seconds? This seems more than adequate for hundreds of >> processors to startup. Or perhaps some timeout based on the number of >> processors? > > No, my suggestion with the 71 minutes didn't aim at a "useful" timeout. > Instead, when QEMU provides the number of VCPUs via fw_cfg, I'd like to > take the timeout *completely* out of the picture. Wait until the > advertised number of VCPUs come up, period. If they don't all appear, > then hang forever. Well, at least for 71 minutes, which is the same for > interactive users. > > If 30 seconds elapse and we boot with 1 or 2 VCPUs missing, then things > will break hard. I don't actually *expect* this to occur against a 30 > second timeout, but 30 seconds still sends the wrong message to the > programmer and the user. It looks like a real, reasonable timeout. While > in this case, the loop should never exit on a timeout, and 0xFFFFFFFF > communicates that. > >> Janusz and I were discussing >> https://github.com/tianocore/edk2/issues/21 on irc. We increased the >> timeout to 10 seconds, and with only 8 processors it was still timing >> out. > > Ugh. > >> Obviously we are somehow failing to start the processors correctly, or >> QEMU/KVM is doing something wrong. > > I think the actual issue we're fighting here is described in > <http://thread.gmane.org/gmane.comp.bios.edk2.devel/3260>. Due to the > kernel commit named there, and due to a physical device being assigned > to the guest, guest memory becomes uncacheable for each AP, until the AP > clears CR0.CD. I guess... And that should slow it down extremely. > >> Have you been able to reproduce this issue? > > I think I have, although I didn't try. :) My current host kernel is > based on v4.3-rc3 (upon which kvm/master is based, upon which I have a > fix), and the commit in question (b18d5431acc7) is part of v4.2-rc1. > > If you have a host kernel at least as fresh as v4.2-rc1 (and I do, see > above), then you run into the issue automatically. For which reason I've > been carrying my patch referenced above in my development branches -- > I've been focusing on the SMM issues, and solving (or working around) > the MP startup problem is a prerequisite for that. > > So, yes, saw it, worked around it immediately, forgot about it. :) > >> It seems like we need to >> set the timeout to 71 minutes, and then debug QEMU/KVM to see what >> state the APs are in... > > I'm a bit overloaded to tackle this right now, but... > >> Unfortunately I haven't yet been able to reproduce the bug on my >> system. :( > > if you install a host kernel at least as recent as v4.2-rc1, then the > bug should pop up at once.
Sorry, forgot about a tiny circumstance: you'll also have to assign a physical device (like a GPU) to your guest. Now *that* is a whole story per se. See <http://vfio.blogspot.com/>. In fact, I'm now thinking that I might not have reproduced the issue at all! (And perhaps I just applied the workaround patch as precaution.) My workstation does have an assigned GPU, but the kernel on it is older than v4.2-rc1 -- no issues there. And on my laptop (which has the recent kernel) no devices can be reasonably assigned to guests. (It has VT-d, and even the IOMMU groups look good, but the PCI devices are not "discrete" enough from a guest driver perspective.) Apologies, I can't help with this right now... Thanks Laszlo _______________________________________________ edk2-devel mailing list [email protected] https://lists.01.org/mailman/listinfo/edk2-devel

