On 1/12/23 17:08, Michael Brown wrote:
> On 12/01/2023 13:22, Laszlo Ersek wrote:
>>>> Detect the issue in PlatformMaxCpuCountInitialization(), and
>>>> print an error message and *hang* if the issue is present.
>>>
>>> Would this mean that OVMF would refuse to start with all current
>>> distro versions of qemu (when not using KVM), or am I
>>> misunderstanding?
>>
>> Your understanding is correct.
>
> I apologise in advance if this is a stupid question, but: given that
> we can detect the issue (as per this patch), and given also:
>
>>> On 12/01/2023 08:28, Laszlo Ersek wrote:
>>>> In QEMU v5.1.0, the CPU hotplug register block misbehaves: the
>>>> negotiation protocol is (effectively) broken such that it
>>>> suggests that switching from the legacy interface to the modern
>>>> interface works, but in reality the switch never happens.
> ...would it work to detect the issue and treat it as "modern interface
> is not supported: continue to use the legacy interface"?  IOW, to
> treat "Present=0" as indicating that the modern interface is not
> supported.

I considered that. It's not a bad idea at all, but it's more complicated
than that.

The code in PlatformPei (or more precisely, nowadays in PlatformInitLib)
could be rearranged. It would be a really ugly patch, because now, after
CmdData2 reads as 0, we're simply not prepared to renege on that
determination. So the branches would have to be rearranged in a quite
ugly manner. *But* it would be doable of course.

However, in the CpuHotplugSmm driver, we also read the same register
(QEMU_CPUHP_R_CMD_DATA2), for determining feature availability. See the
QemuCpuhpReadCommandData2() call in
"OvmfPkg/CpuHotplugSmm/CpuHotplug.c", function CpuHotplugEntry(). And
the outcome of *that* check is final. No counting is done there anymore,
because it's unnecessary in the first place. If the check succeeds, we
proceed, if the check fails, we hang hard. That sanity check is also
misled by the QEMU bug, and tweaking *that* check is out of question.

And I didn't want to post a patch that would make PlatformPei deal with
the QEMU bug one way, and CpuHotplugSmm another way.

Technically, the following should be possible: in PlatformInitLib,
rearrange the branches such that we can still do "something" if the QEMU
bug is detected. Then, if PcdSmmSmramRequire is TRUE, hang hard, and if
PcdSmmSmramRequire is FALSE, then just pretend the modern hotplug
register block is not available.

Unfortunately, I really dislike this. It will cause the same hang on
distros that build OVMF -- possibly their *only* OVMF binary -- with -D
SMM_REQUIRE. Then the request will morph into "can we refine this even
further". And sure, you can refine it even further:

- the CpuHotplugSmm driver exits gracefully if QEMU does not negotiate
the "SMRAM at default SMBASE" feature (see edk2 commit d74d56fcfa31
("OvmfPkg: introduce PcdQ35SmramAtDefaultSmbase", 2020-02-05), and QEMU
commit f404220e279c ("q35: implement 128K SMRAM at default SMBASE
address", 2020-01-22)). This can be disabled through the
"mch.smbase-smram" property on the QEMU command line.

- QEMU can be configured with other compat properties on the command
line so that "CPU hotplug with SMI" and "CPU hot-unplug with SMI" *not*
be offered to the firmware. Then QEMU will reject hotplug attempts, and
the SMM hotplug code in edk2 will not be triggered by the (virtual)
hardware.

The case is that both QEMU and edk2 check for each other's supported
features. It's a complex interwoven feature set with security impact,
which is exactly why we added feature negotiation at every step --
effectively mutual negotiation wherever necessary. I cannot claim I
remember every part of it, and playing tricks around feature negotiation
with SMM impact makes me *extremely uncomfortable*. I absolutely don't
want to author an OVMF patch, briefly before I disappear again (for
good!), that "looks good" now, and then becomes a horrible SMM CVE in a
year or two. I want to go for "obviously no bug", rather than "no
obvious bug".

In brief, my counter-argument is: once we start refining the hang, it
will never be nuanced and polished enough for users, and we'll just heap
on more complexity, until we introduce an obscure but nasty bug. I don't
want to start down that path.

I can live with this patch being rejected altogether (that's consistent
with me having exited edk2 -- I can pretend I don't know about it; it's
not my job anymore). I will not take responsibility for relaxing the
proposed hang however, not even as a reviewer ACKing it. If someone can
rearrange the code, drawing a *practical* but also secure boundary for
the fallback, that's great, the patch can go in without me "being in the
know". I want none of that responsibility.

I don't want to force my hard-liner view on QEMU/OVMF users here; I need
not participate in a relaxed approach either, however. This reads like a
re-run of the "old grub in guest versus non-executable UEFI heap"
discussion, and it's giving me the shivers.

Your proposal is entirely justified, from a practical / user
perspective, but I'm not the right person for it.

Thanks
Laszlo



-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#98385): https://edk2.groups.io/g/devel/message/98385
Mute This Topic: https://groups.io/mt/96218818/21656
Group Owner: devel+ow...@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Reply via email to