If the C-bit is cleared for a read/write buffer, is the content in the
read/write buffer also exposed to hypervisor?
Not immediately. Only when the guest also rewrites the memory through
the appropriate virtual address.
Basically, in the virtual machine, the C-bit in a PTE that covers a
particular guest virtual address (GVA) controls whether a guest write
through that GVA will result in memory encryption, and if a gues read
through that GVA will result in memory decryption.
The current state of the C-bit doesn't matter for the hypervisor, what
matters is the last guest write through a GVA whose PTE had the C-bit
set, or clear. If the C-bit was clear when the guest last wrote the
area, then the hypervisor can read the data. If the C-bit was set, then
the hypervisor can only read garbage.
I means if there is security concern, the concern should be applied to
both common buffer and read/write buffer.
Then we have to figure a way to resolve both buffer.
Yes, this is correct. The PciIo.Unmap operation, as currently
implemented in OvmfPkg/IoMmuDxe/, handles the encryption/decryption
correctly for all operations, but it can only guarantee *not* freeing
memory for CommonBuffer. So Unmap()ing CommonBuffer at ExitBootServices
is safe, while Unmap()ing Read/Write is not. The encryption would be
re-established just fine for Read/Write as well, but we would change the
UEFI memmap.
In OVMF, we currently manage this problem by making all asynchronous DMA
mappings CommonBuffer, even if they could othewise be Read or Write. We
use Read and Write only if the DMA operation completes before the
higher-level protocol function returns (so we can immediately Unmap the
operation, and the ExitBootServices() handler doesn't have to care).
To be honest, that is exactly my biggest question on this patch:
why do we only handle the common buffer specially?
For the following reason:
- Given that CommonBuffer mappings take a separate AllocateBuffer() /
FreeBuffer() call-pair, around Map/Unmap, we can *reasonably ask* PciIo
implementors -- beyond what the UEFI spec requries -- to keep *all*
dynamic memory management out of Map/Unmap. If they need dynamic memory
management, we ask them to do it in AllocateBuffer/FreeBuffer instead.
This way Unmap is safe in ExitBootServices handlers.
- We cannot *reasonably ask* PciIo implementors to provide the same
guarantee for Read and Write mappings, simply because there are no other
APIs that could manage the dynamic memory for Map and Unmap --
AllocateBuffer and FreeBuffer are not required for Read and Write
mappings. PciIo implementors couldn't agree to our request even if they
wanted to. Therefore Unmapping Read/Write operations is inherently
unsafe in EBS handlers.
NOTE: The device should still be halt state, because the device is
also controlled by OS.
This is the key point -- the device is *not* controlled by the guest OS,
in the worst case.
The virtual device is ultimately implemented by the hypervisor. We don't
expect the virtual device (= the hypervisor) to mis-behave on purpose.
However, if the hypervisor is compromised by an external attacker (for
example over the network, or via privilege escalation from another
guest), then all guest RAM that has not been encrypted up to that point
becomes visible to the attacker. In other words, the hypervisor ("the
device") becomes retro-actively malicious.
[Jiewen] If that is the threat model, I have a question on the exposure:
1) If the concern is for existing data, it means all DMA data already written.
However, the DMA data is already consumed or produced by virtual device
or a hypervisor. If the hypervisor is attacked, it already gets all the data
content.
Maybe, maybe not. The data may have been sent to a different host over
the network, and wiped from memory.
Or, the data may have been written to a disk image that is separately
encrypted by the host OS, and has been detached (unplugged) from the
guest, and also unmounted from the host, with the disk key securely
wiped from host memory.
We can also imagine the reverse direction. Assume that the user of the
virtual machine goes into the UEFI shell in OVMF, starts the EDIT
program, and types some secret information into a text file on the ESP.
Further assume that the disk image is encrypted on the host OS. It is
conceivable that fragments of the secret could remain stuck in the AHCI
(disk) and USB (keyboard) DMA buffers.
Maybe you think that these are "made up" examples, and I agree, I just
made them up. The point is, it is not my place to discount possible
attack vectors (I haven't designed SEV). Attackers will be limited by
their *own* imaginations only, not by mine :)
2) if the concern is for future data, it means all user data to be written.
However, the C-bit already prevent that.
As far as I understand SEV, future data is out of scope. The goal is to
prevent *retroactive* guest data leaks (= whatever is currently in host
OS memory) if an attacker compromises an otherwise non-malicious hypervisor.
If an attacker compromises a virtualization host, then they can just sit
silent and watch data flow into and out of guests from that point
onward, because they can then monitor all DMA (which always happens in
clear-text) real-time.
Maybe I do not fully understand the threat model defined.
If you can explain more, that would be great.
Well I tried to explain *my* understanding of SEV :) I hope Brijesh will
correct me if I'm wrong.
The point of SEV is to keep as much guest data encrypted at all times as
possible, so that *whenever* the hypervisor is compromised by an
attacker, the guest memory -- which the attacker can inevitably dump
from the host side -- remains un-intellegible to the attacker.
[Jiewen] OK. If this is a security question, I suggest we define a clear
threat model at first.
Or what we are doing might be unnecessary or insufficient.
I agree.
As I said above, my operating principle has been to re-encrypt all DMA
buffers as soon as possible. For long-standing DMA buffers, re-encrypt
them at ExitBootServices at the latest.
[Jiewen] For "require that Unmap() work for CommonBuffer
operations without releasing memory", I would hold my opinion until
it is documented clearly in UEFI spec.
You are right, of course.
It's just that we cannot avoid exploring ways, for this feature, that
currently fall outside of the spec. Whatever we do here will be outside
of the spec, one way or another. I suggested what I thought could be a
reasonable extension to the spec, one that could be implemented by PciIo
implementors even before the spec is modified.
edk2's PciIo.Unmap already behaves like this, without breaking the spec.
If there's a better way -- for example modifying CoreExitBootServices()
in edk2, to signal IOMMU drivers separately, *after* notifying other
ExitBootServices() handlers --, that might work as well.
My current concern is:
First, this sentence is NOT defined in UEFI specification.
Correct.
Second, it might break an existing implementation or existing feature, such as
tracing.
Correct.
Till now, I did not see any restriction in UEFI spec say: In this function, you
are not allowed
to call memory services.
The only restriction is
1) TPL restriction, where memory service must be <= TPL_NOTIFY.
2) AP restriction, where no UEFI service/protocol is allowed for AP.
I agree.
I'm just trying to operate with the APIs currently defined by the UEFI
spec, and these assumptions were the best I could come up with.
[Jiewen] The PCI device driver must follow and *only* follow UEFI spec.
Especially the IHV Option ROM should not consume any private API.
I disagree about "only follow". If there are additional requirements
that can be satisfied without breaking the spec, driver implementors can
choose to conform to both sets of requirements.
For example (if I understand correctly), Microsoft / Windows present a
bunch of requirements *beyond* the UEFI spec, for both platform and
add-on card firmware. Most vendors conform :)
[Jiewen] I am not sure who will control "When control is transferred to the OS,
all
guest memory should be encrypted again, even those areas that were once
used as CommonBuffers."
For SEV case, I think it should be controlled by OS, because it is just CR3.
If it was indeed just CR3, then I would fully agree with you.
But -- to my understanding --, ensuring that the memory is actually
encrypted requires that the guest *write* to the memory through a
virtual address whose PTE has the C-bit set.
And I don't think the guest OS can be expected to rewrite all of its
memory at launch. :(
[Jiewen] That is fine.
I suggest we get clear on the threat model as the first step.
The threat model for SEV usage in OVMF.
I am sorry if that is already discussed before. I might ignore the conversation.
No problem; it's always good to re-investigate our assumptions and
operating principles.
If you or any SEV owner can share the insight, that will be great.
See my question above "If that is the threat model, I have a question on the
exposure:..."
I shared *my* impressions of the threat model (which are somewhat
unclear, admittedly, and thus could make me overly cautious).
I hope Brijesh can extend and/or correct my description.
So apparently the default behaviors of the VT-d IOMMU and the SEV IOMMU
are opposites -- VT-d permits all DMA unless configured otherwise, while
SEV forbids all DMA unless configured otherwise.
[Jiewen] I do not think so. The default behaviors of VT-d IOMMU is to disable
all DMA access.
I setup translation table, but mark all entry to be NOT-PRESENT.
I grant DMA access only if there is a specific request by a device.
I open DMA access in ExitBootServices, just want to make sure it is compatible
to
the existing OS, which might not have knowledge on IOMMU.
I will provide a patch to make it a policy very soon. As such VTd IOMMU may
turn off IOMMU, or keep it enabled at ExitBootService.
An IOMMU aware OS may take over IOMMU directly and reprogram it.
Thanks for the clarification.
But, again, will this not lead to the possibility that the VT-d IOMMU's
ExitBootServices() handler disables all DMA *before* the PCI drivers get
a chance to run their own ExitBootServices() handlers, disabling the
individual devices?
[Jiewen] Sorry for the confusing. Let me explain:
No, VTd never disables *all* DMA buffer at ExitBootService.
"disable VTd" means to "disable IOMMU protection" and "allow all DMA".
"Keep VTd enabled" means to "keep IOMMU protection enabled" and
"only allow the DMA from Map() request".
Okay, but this interpretation was exactly what I thought of first (see
above): "VT-d permits all DMA unless configured otherwise". You answered
that it wasn't the case.
So basically, if I understand it correctly now, at ExitBootServices the
VT-d IOMMU driver opens up all RAM for DMA access. Is that correct?
That is equivalent to decrypting all memory under SEV, and is the exact
opposite of what we want. (As I understand it.)
If that happens, then a series of IOMMU faults could be generated, which
I described above. (I.e., such IOMMU faults could result, at least in
the case of SEV, in garbage being written to disk, via queued commands.)
[Jiewen] You are right. That would not happen.
IOMMU driver should not bring any compatibility problem for the PCI driver,
iff the PCI driver follows the UEFI specification.
Agreed.
Now, to finish up, here's an idea I just had.
Are we allowed to call gBS->SignalEvent() in an ExitBootServices()
notification function?
If we are, then we could do the following:
* PciIo.Unmap() and friends would work as always (no restrictions on
dynamic memory allocation or freeing, for any kind of DMA operation).
* PCI drivers would not be expected to call PciIo.Unmap() in their
ExitBootServices() handlers.
* The IOMMU driver would have an ExitBootServices() notification
function, to be enqueued at the TPL_CALLBACK or TPL_NOTIFY level
(doesn't matter which).
* In this notification function, the IOMMU driver would signal *another*
event (a private one). The notification function for this event would
be enqueued strictly at the TPL_CALLBACK level.
* The notification function for the second event (private to the IOMMU)
would iterate over all existent mappings, and unmap them, without
allocating or freeing memory.
The key point is that by signaling the second event, the IOMMU driver
could order its own cleanup code after all other ExitBootServices()
callbacks. (Assuming, at least, that no other ExitBootServices()
callback employs the same trick to defer itself!) Therefore by the time
the second callback runs, all PCI devices have been halted, and it is
safe to tear down the translations.
The ordering would be ensured by the following:
- The UEFI spec says under CreateEventEx(), "All events are guaranteed
to be signaled before the first notification action is taken." This
means that, by the time the IOMMU's ExitBootServices() handler is
entered, all other ExitBootServices() handlers have been *queued* at
least, at TPL_CALLBACK or TPL_NOTIFY.
- Therefore, when we signal our second callback from there, for
TPL_CALLBACK, the second callback will be queued at the end of the
TPL_CALLBACK queue.
- CreateEventEx() also says that EFI_EVENT_GROUP_EXIT_BOOT_SERVICES "is
functionally equivalent to the EVT_SIGNAL_EXIT_BOOT_SERVICES flag
for the Type argument of CreateEvent." So it wouldn't matter if a
driver used CreateEvent() or CreateEventEx() for setting up its own
handler, the handler would be queued just the same.
I think this is ugly and brittle, but perhaps the only way to clean up
*all* translations safely, with regard to PciIo.Unmap() +
ExitBootServices(), without updating the UEFI spec.
What do you think?
[Jiewen] If the order is correct, and all PCI device driver is halt at
ExitBootServices, that works.
We do not need update PCI driver and we do not need update UEFI spec.
We only need update IOMMU driver which is concerned, based upon the threat
model.
That probably is best solution. :-)
I'm very glad to hear that you like the idea.
However, it depends on whether we are permitted, by the UEFI spec, to
signal another event in an ExitBootServices() notification function.
Are we permitted to do that?
Does the UEFI spec guarantee that the notification function for the
*second* event will be queued just like it would be under "normal"
circumstances?
(I know we must not allocate or free memory in the notification function
of the *second* event either; I just want to know if the second event's
handler is *queued* like it would normally be.)
I assume you want to handle both common buffer and read/write buffer, right?
Yes. Under this idea, all kinds of operations would be cleaned up.
And if possible, I still have interest to get clear on the threat model for SEV
in OVMF.
If you or any SEV owner can share ...
Absolutely. Please see above.
Thank you!
Laszlo