On 10/25/2017 08:13 PM, Dong, Eric wrote:
Laszlo,
-----Original Message-----
From: edk2-devel [mailto:edk2-devel-boun...@lists.01.org] On Behalf Of
Laszlo Ersek
Sent: Wednesday, October 25, 2017 11:08 PM
To: Dong, Eric <eric.d...@intel.com>; edk2-devel@lists.01.org
Cc: Ni, Ruiyu <ruiyu...@intel.com>; Paolo Bonzini <pbonz...@redhat.com>
Subject: Re: [edk2] [Patch 2/2] UefiCpuPkg/MpInitLib: Enhance waiting for
AP initialization logic.
Hi Eric,
On 10/25/17 07:42, Dong, Eric wrote:
Hi Laszlo,
I think I have clear about your raised issues for Ovmf platform. In this case, I
think your platform not suit for this code change. How about I do below
change based on the new code:
- if (CpuMpData->CpuCount == 0) {
TimedWaitForApFinish (
CpuMpData,
PcdGet32 (PcdCpuMaxLogicalProcessorNumber) - 1,
PcdGet32 (PcdCpuApInitTimeOutInMicroSeconds)
);
- }
After this change, for Ovmf can still set
PcdCpuApInitTimeOutInMicroSeconds to MAX_UINT32 to keep old
solution.
For the real platform, it can set a small value to follow the new
solution. For this new change, it just wait one more
PcdCpuApInitTimeOutInMicroSeconds timeout, should not have
performance
impact (This time may also been cost in later while
(CpuMpData->MpCpuExchangeInfo->NumApsExecuting != 0) loop) or have
litter performance impact (not cover by the later loop).
The suggested change will work for OVMF, thanks!
(Just please don't forget to un-indent the TimedWaitForApFinish() call that
will become unconditional again.)
--o--
Do you think Brian's concern should be investigated as well (perhaps in a
separate Bugzilla entry)?
As Jeff has mentioned, only Ovmf can know the exist Ap numbers before
send the Init-sipi-sipi. So we don't know how many bits needed for this bitmap.
In case we can create the bitmap, we still don't know when all Aps have been
found(If the begin map bit value same as finish map bit value, we still can't
get
the conclusion that all Aps have been found, because maybe other Aps not
started yet).
Right, physical platforms don't usually know the number of CPUs up
front, so they still need a timeout. That's unavoidable. But we've
seen cases where interrupts aren't getting delivered reliably, so not
all the expected CPUs start up (based on the core counts in the physical
sockets, as known by the developing engineer, not the firmware.) To
debug this failure, it's very useful to have a list of exactly which
CPUs did start.
Similarly, we've seen cases where a CPU starts, but crashes due to h/w
or s/w bugs before signaling the BSP that it has finished. With only a
counter to indicate how many CPUs are still running, it's impossible to
tell which CPUs failed. That makes debugging much more difficult.
The bitmaps would need to be sized according to the maximum number of
CPUs supported by the platform (or the maximum APIC ID, depending on how
they are indexed), like other data structures in the MpCpu code.
Bitmaps aren't the only possible implementation of course.... My point
is that it's useful to have a list of which CPUs started executing, and
which finished.
Thanks,
Eric
Because, I believe, the scheduling pattern that I described earlier could be
possible on physical platforms as well, at least in theory:
(2) After at least one AP has started running, the logic expects
"NumApsExecuting" to monotonically grow for a while, and then to
monotonically decrease, back to zero. For example, if we have 1 BSP
and
7 APs, the BSP logic more or less expects the following values in
"NumApsExecuting":
1; 2; 3; 4; 5; 6; 7;
6; 5; 4; 3; 2; 1; 0
While this may be a valid expectation for physical processors (which
actually run in parallel, in the physical world), in a virtual machine, it is
not
guaranteed.
Dependent on hypervisor scheduling artifacts, it is possible that,
say, three APs start up *and finish* before the remaining four APs
start up *at all*. The INIT-SIPI-SIPI marks all APs for execution /
scheduling in the hypervisor, yes, but the actual code execution may
commence a lot later. For example, the BSP may witness the following
series of values in "NumApsExecuting":
1; 2; 3;
2; 1; 0;
1; 2; 3; 4;
3; 2; 1; 0
and the BSP could think that there are 3 APs only, when it sees the
first 0 value.
I don't know if such a pattern is "likely", "unlikely", or "extremely unlikely"
on
physical platforms. I think the pattern is "theoretically possible" at least.
I suggest that we go ahead with the small patch for the OVMF use case first,
and then open a new BZ if Brian thinks there's a real safety problem with the
code.
We also notice this theoretical problem when we implement this change, but
We think this is rarely to happen on physical platforms. We think the value for
the change is much bigger than not do this change because of this theoretical
problem.
I strongly disagree. Any chance for it to happen on physical platforms
is too large of a chance. There's simply too much variance between
physical platforms to make assumptions about interrupt delivery and the
interleaving of atomic operations among dozens (or thousands!) of CPUs.
With the latest change from Eric above, we can restore the old behavior
by setting PcdCpuApInitTimeOutInMicroSeconds large enough to cover all
APs. So the new behavior is just an optimization which a platform can
enable if its timing characteristics are suitable for it. That should work.
Thanks,
Brian
Thanks,
Eric
... I should note that, with the small patch that's going to fix the OVMF use
case, the previous behavior will be restored for *all* platforms that continue
using their previous PCD values.
The cumulative effect of commit 0594ec417c89 and the upcoming patch will
be that a platform may significantly decrease
"PcdCpuApInitTimeOutInMicroSeconds", if it chooses to, and then the
"NumApsExecuting" loop will take the role of the preexisting
TimedWaitForApFinish() call. If a platform doesn't want that, then it should
keep its "PcdCpuApInitTimeOutInMicroSeconds" as high as before, and then
any implementation details in the "NumApsExecuting" loop will be irrelevant.
Thanks!
Laszlo
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel
--
Brian
--------------------------------------------------------------------
"This isn't a design, it's a hack."
-- Stephen Gunn
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel