Re: [edk2] [PATCH 0/4] MdeModulePkg: some PCI HC drivers: unmap common buffers at ExitBootServices()

Brijesh Singh Thu, 07 Sep 2017 07:40:54 -0700

Hi Jiewen,

On 09/07/2017 06:46 AM, Laszlo Ersek wrote:

On 09/07/17 06:46, Yao, Jiewen wrote:

Thanks for the sharing, Brijesh.


"If a page was marked as "shared"
then its guest responsibility to make it "private" after its done communicating 
with
hypervisor."

I believe I have same understanding - a *guest* should guarantee that.

My question is: During the *transition* from firmware to OS, *which guest* should make the shared 
buffer to be private? Is it "guest firmware" or "guest OS"?

Maybe I can ask the specific question to get it more clear.

1)       If the common DMA buffer is not unmapped at ExitBootService, is that 
treated as an issue?

2)       If the read/write DMA buffer is not unmapped at ExitBootService, is 
that treated as an issue?


Very good questions, totally to the point.

On the authoritative answer, I defer to Brijesh.



Both the above cases (#1 and #2) are problems. Since buffers was owned by 
firmware
and firmware made it "shared" hence firmware is responsible to mark them as 
private
after its done with the buffer. In other words, we must call Unmap() from 
ExitBootServices
to ensure that buffers mapped using 
BusMasterCommonBuffer/BusMasterRead/BusMasterWrite
is marked as "private" before we make it available to the guest OS. (we do 
similar thing
in Linux OS).

Having any kind of side channel to communicate the encryption status of a page
will not work -- we should be able to support a usecase where we boot OVMF in
64-bit but launch 32-bit Linux guest. When Linux boots in 32-bit mode it does 
not have
access to encryption bit (C-bit is bit-47 in page table) and can't mark the 
page as
private (even if we provide some kind of side-channel).

thank you very much for all your help.

(

My personal opinion is that both situations (#1 and #2) are problems,
because they break the *practical* security invariant for SEV guests:

- most memory should be encrypted at all times, *and*

- any memory that is decrypted must have an owner that is responsible
   for re-encrypting the memory eventually.

Therefore, *either* the firmware has to re-encrypt all remaining DMA
buffers at ExitBootServices(), *or* a new information channel must be
designed, from firmware to OS, to carry over the decryption status.

I strongly prefer the first option, for the following reason: the same
questions apply to all EDK2 IOMMU protocol interfaces, not just the one
exported by the SEV driver.

)

Thanks,
Laszlo

From: Brijesh Singh [mailto:[email protected]]
Sent: Wednesday, September 6, 2017 11:40 PM
To: Laszlo Ersek <[email protected]>; Yao, Jiewen <[email protected]>; Zeng, Star 
<[email protected]>; edk2-devel-01 <[email protected]>
Cc: [email protected]; Dong, Eric <[email protected]>
Subject: Re: [edk2] [PATCH 0/4] MdeModulePkg: some PCI HC drivers: unmap common 
buffers at ExitBootServices()



On 09/06/2017 07:14 AM, Laszlo Ersek wrote:

On 09/06/17 06:37, Yao, Jiewen wrote:

Thanks for the clarification. Comment in line.

From: Laszlo Ersek [mailto:[email protected]]
Sent: Wednesday, September 6, 2017 1:57 AM
To: Yao, Jiewen <[email protected]<mailto:[email protected]>>; Zeng, Star 
<[email protected]<mailto:[email protected]>>; edk2-devel-01 
<[email protected]<mailto:[email protected]>>
Cc: Dong, Eric <[email protected]<mailto:[email protected]>>; Brijesh Singh 
<[email protected]<mailto:[email protected]>>
Subject: Re: [edk2] [PATCH 0/4] MdeModulePkg: some PCI HC drivers: unmap common 
buffers at ExitBootServices()

Then after ExitBootService, the OS will take control of CR3 and set correct
encryption bit.


This is true, the guest OS will set up new page tables, and in those
PTEs, the C-bit ("encrypt") will likely be set by default.

However, the guest OS will not *rewrite* all of the memory, with the
C-bit set. This means that any memory that the firmware didn't actually
re-encrypt (in the IOMMU driver) will still expose stale data to the
hypervisor.
[Jiewen] That is an interesting question.
Is there any security concern associated?


I can't tell for sure. Answering this question is up to the proponents
of SEV, who have designed the threat model for SEV.

My operating assumption is that we should keep memory as tightly
encrypted as possible at the firmware --> OS control transfer. It could
be an exaggerated expectation from my side; I'd just better be safe than
sorry :)


Let me give some brief intro on SEV (Secure Encrypted Virtualization) and then
we can discuss a security model (if needed).

SEV is an extension to the AMD-V architecture which supports running encrypted
virtual machines (VMs) under the control of a hypervisor. Encrypted VMs have 
their
pages (code and data) secured such that only the guest itself has access to the
unencrypted version. Each encrypted VMs is associated with a unique encryption
key; if its data is accessed by a different entity using a different key the
encrypted guest data will be incorrectly decrypted, leading to unintelligible 
data.
You can also find some detail for SEV in white paper [1].

SEV encrypted Vs have the concept of private and shared memory. The private 
memory
is encrypted with the guest-specific key, while shared memory may be encrypted
with hypervisor key. SEV guest VMs can choose which pages they would like to
be private. But the instruction pages and guest page tables are always treated
as private by the hardware. The DMA operation inside the guest must be performed
on shared pages -- this is mainly because in virtualization world the device
DMA needs some assistance from the hypervisor.

The security model provided by the SEV ensure that hypervisor will no longer 
able
to inspect or alter any guest code or data. The guest OS controls what it want 
to
share with outside world (in this case with Hypervisor).

In software implementation we took approach to encrypt everything possible 
starting
early in boot. In this approach whenever OVMF wants to perform certain DMA 
operations
it allocate a shared page, issues the command, free the shared page after the 
command
is completed (in other word we use sw bounce buffer to complete the DMA 
operation).

We have implemented IOMMU driver to provide the following functions:

AllocateBuffer():
--------------------
it allocate a private pages, as per UEFI spec the driver will map the buffer 
allocated
from this routine using BusMasterCommonBuffer hence we allocate extra stash 
pages
in addition to requested pages.


Map
----
If requested operation is BusMasterRead or BusMasterWrite then we allocate a 
shared buffer
and DeviceAddress point to shared buffer.

If requested operation is BusMasterCommonBuffer then we perform in-place 
decryption of the
contents and update the page-table to clear the C-bit (basically make it shared 
page)

Unmap
------
If operation was BusMasterRead or BusMasterWrite then we complete the unmapping 
and free
the shared buffer allocated in Map().

If operation was BusMasterCommonBuffer then we perform in-place encryption and 
set the C-bit
(basically make the page private)

FreeBuffer:
-----------
Free the pages


Lets run with the below scenario:

1) guest marks a page as "shared" and gives its physical address to HV (e.g DMA 
read)
2) hypervisor completes the request operation
3) hypervisor caches the shared physical address and monitor it for information 
leak
4) sometime later if guest write data in its "shared" physical address then 
hypervisor can
     easily read it without guest knowledge.

SEV hardware can protect us against the attack where someone tries to inject or 
alter the
guest code. As I noted above any instruction fetch is forced as private hence 
if attacker
write some code into a shared buffer and point the RIP to his/her code then 
instruction
fetch will try to decrypt it and get unintelligible opcode. If a page was marked as 
"shared"
then its guest responsibility to make it "private" after its done communicating 
with
hypervisor.

[1] 
http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/12/AMD_Memory_Encryption_Whitepaper_v7-Public.pdf

If the C-bit is cleared for a read/write buffer, is the content in the
read/write buffer also exposed to hypervisor?


Not immediately. Only when the guest also rewrites the memory through
the appropriate virtual address.

Basically, in the virtual machine, the C-bit in a PTE that covers a
particular guest virtual address (GVA) controls whether a guest write
through that GVA will result in memory encryption, and if a gues read
through that GVA will result in memory decryption.

The current state of the C-bit doesn't matter for the hypervisor, what
matters is the last guest write through a GVA whose PTE had the C-bit
set, or clear. If the C-bit was clear when the guest last wrote the
area, then the hypervisor can read the data. If the C-bit was set, then
the hypervisor can only read garbage.

I means if there is security concern, the concern should be applied to
both common buffer and read/write buffer.
Then we have to figure a way to resolve both buffer.


Yes, this is correct. The PciIo.Unmap operation, as currently
implemented in OvmfPkg/IoMmuDxe/, handles the encryption/decryption
correctly for all operations, but it can only guarantee *not* freeing
memory for CommonBuffer. So Unmap()ing CommonBuffer at ExitBootServices
is safe, while Unmap()ing Read/Write is not. The encryption would be
re-established just fine for Read/Write as well, but we would change the
UEFI memmap.

In OVMF, we currently manage this problem by making all asynchronous DMA
mappings CommonBuffer, even if they could othewise be Read or Write. We
use Read and Write only if the DMA operation completes before the
higher-level protocol function returns (so we can immediately Unmap the
operation, and the ExitBootServices() handler doesn't have to care).

To be honest, that is exactly my biggest question on this patch:
why do we only handle the common buffer specially?


For the following reason:

- Given that CommonBuffer mappings take a separate AllocateBuffer() /
FreeBuffer() call-pair, around Map/Unmap, we can *reasonably ask* PciIo
implementors -- beyond what the UEFI spec requries -- to keep *all*
dynamic memory management out of Map/Unmap. If they need dynamic memory
management, we ask them to do it in AllocateBuffer/FreeBuffer instead.
This way Unmap is safe in ExitBootServices handlers.

- We cannot *reasonably ask* PciIo implementors to provide the same
guarantee for Read and Write mappings, simply because there are no other
APIs that could manage the dynamic memory for Map and Unmap --
AllocateBuffer and FreeBuffer are not required for Read and Write
mappings. PciIo implementors couldn't agree to our request even if they
wanted to. Therefore Unmapping Read/Write operations is inherently
unsafe in EBS handlers.

NOTE: The device should still be halt state, because the device is
also controlled by OS.


This is the key point -- the device is *not* controlled by the guest OS,
in the worst case.

The virtual device is ultimately implemented by the hypervisor. We don't
expect the virtual device (= the hypervisor) to mis-behave on purpose.
However, if the hypervisor is compromised by an external attacker (for
example over the network, or via privilege escalation from another
guest), then all guest RAM that has not been encrypted up to that point
becomes visible to the attacker. In other words, the hypervisor ("the
device") becomes retro-actively malicious.
[Jiewen] If that is the threat model, I have a question on the exposure:
1) If the concern is for existing data, it means all DMA data already written.
However, the DMA data is already consumed or produced by virtual device
or a hypervisor. If the hypervisor is attacked, it already gets all the data 
content.


Maybe, maybe not. The data may have been sent to a different host over
the network, and wiped from memory.

Or, the data may have been written to a disk image that is separately
encrypted by the host OS, and has been detached (unplugged) from the
guest, and also unmounted from the host, with the disk key securely
wiped from host memory.

We can also imagine the reverse direction. Assume that the user of the
virtual machine goes into the UEFI shell in OVMF, starts the EDIT
program, and types some secret information into a text file on the ESP.
Further assume that the disk image is encrypted on the host OS. It is
conceivable that fragments of the secret could remain stuck in the AHCI
(disk) and USB (keyboard) DMA buffers.

Maybe you think that these are "made up" examples, and I agree, I just
made them up. The point is, it is not my place to discount possible
attack vectors (I haven't designed SEV). Attackers will be limited by
their *own* imaginations only, not by mine :)

2) if the concern is for future data, it means all user data to be written.
However, the C-bit already prevent that.


As far as I understand SEV, future data is out of scope. The goal is to
prevent *retroactive* guest data leaks (= whatever is currently in host
OS memory) if an attacker compromises an otherwise non-malicious hypervisor.

If an attacker compromises a virtualization host, then they can just sit
silent and watch data flow into and out of guests from that point
onward, because they can then monitor all DMA (which always happens in
clear-text) real-time.

Maybe I do not fully understand the threat model defined.
If you can explain more, that would be great.


Well I tried to explain *my* understanding of SEV :) I hope Brijesh will
correct me if I'm wrong.

The point of SEV is to keep as much guest data encrypted at all times as
possible, so that *whenever* the hypervisor is compromised by an
attacker, the guest memory -- which the attacker can inevitably dump
from the host side -- remains un-intellegible to the attacker.
[Jiewen] OK. If this is a security question, I suggest we define a clear
threat model at first.
Or what we are doing might be unnecessary or insufficient.


I agree.

As I said above, my operating principle has been to re-encrypt all DMA
buffers as soon as possible. For long-standing DMA buffers, re-encrypt
them at ExitBootServices at the latest.

[Jiewen] For "require that Unmap() work for CommonBuffer
operations without releasing memory", I would hold my opinion until
it is documented clearly in UEFI spec.


You are right, of course.

It's just that we cannot avoid exploring ways, for this feature, that
currently fall outside of the spec. Whatever we do here will be outside
of the spec, one way or another. I suggested what I thought could be a
reasonable extension to the spec, one that could be implemented by PciIo
implementors even before the spec is modified.

edk2's PciIo.Unmap already behaves like this, without breaking the spec.

If there's a better way -- for example modifying CoreExitBootServices()
in edk2, to signal IOMMU drivers separately, *after* notifying other
ExitBootServices() handlers --, that might work as well.

My current concern is:
First, this sentence is NOT defined in UEFI specification.


Correct.

Second, it might break an existing implementation or existing feature, such as 
tracing.


Correct.

Till now, I did not see any restriction in UEFI spec say: In this function, you 
are not allowed
to call memory services.
The only restriction is
1) TPL restriction, where memory service must be <= TPL_NOTIFY.
2) AP restriction, where no UEFI service/protocol is allowed for AP.


I agree.

I'm just trying to operate with the APIs currently defined by the UEFI
spec, and these assumptions were the best I could come up with.
[Jiewen] The PCI device driver must follow and *only* follow UEFI spec.
Especially the IHV Option ROM should not consume any private API.


I disagree about "only follow". If there are additional requirements
that can be satisfied without breaking the spec, driver implementors can
choose to conform to both sets of requirements.

For example (if I understand correctly), Microsoft / Windows present a
bunch of requirements *beyond* the UEFI spec, for both platform and
add-on card firmware. Most vendors conform :)

[Jiewen] I am not sure who will control "When control is transferred to the OS, 
all
guest memory should be encrypted again, even those areas that were once
used as CommonBuffers."
For SEV case, I think it should be controlled by OS, because it is just CR3.


If it was indeed just CR3, then I would fully agree with you.

But -- to my understanding --, ensuring that the memory is actually
encrypted requires that the guest *write* to the memory through a
virtual address whose PTE has the C-bit set.

And I don't think the guest OS can be expected to rewrite all of its
memory at launch. :(

[Jiewen] That is fine.
I suggest we get clear on the threat model as the first step.
The threat model for SEV usage in OVMF.

I am sorry if that is already discussed before. I might ignore the conversation.


No problem; it's always good to re-investigate our assumptions and
operating principles.

If you or any SEV owner can share the insight, that will be great.
See my question above "If that is the threat model, I have a question on the 
exposure:..."


I shared *my* impressions of the threat model (which are somewhat
unclear, admittedly, and thus could make me overly cautious).

I hope Brijesh can extend and/or correct my description.

So apparently the default behaviors of the VT-d IOMMU and the SEV IOMMU
are opposites -- VT-d permits all DMA unless configured otherwise, while
SEV forbids all DMA unless configured otherwise.
[Jiewen] I do not think so. The default behaviors of VT-d IOMMU is to disable 
all DMA access.
I setup translation table, but mark all entry to be NOT-PRESENT.
I grant DMA access only if there is a specific request by a device.

I open DMA access in ExitBootServices, just want to make sure it is compatible 
to
the existing OS, which might not have knowledge on IOMMU.
I will provide a patch to make it a policy very soon. As such VTd IOMMU may
turn off IOMMU, or keep it enabled at ExitBootService.
An IOMMU aware OS may take over IOMMU directly and reprogram it.


Thanks for the clarification.

But, again, will this not lead to the possibility that the VT-d IOMMU's
ExitBootServices() handler disables all DMA *before* the PCI drivers get
a chance to run their own ExitBootServices() handlers, disabling the
individual devices?
[Jiewen] Sorry for the confusing. Let me explain:
No, VTd never disables *all* DMA buffer at ExitBootService.

"disable VTd" means to "disable IOMMU protection" and "allow all DMA".
"Keep VTd enabled" means to "keep IOMMU protection enabled" and
"only allow the DMA from Map() request".


Okay, but this interpretation was exactly what I thought of first (see
above): "VT-d permits all DMA unless configured otherwise". You answered
that it wasn't the case.

So basically, if I understand it correctly now, at ExitBootServices the
VT-d IOMMU driver opens up all RAM for DMA access. Is that correct?

That is equivalent to decrypting all memory under SEV, and is the exact
opposite of what we want. (As I understand it.)

If that happens, then a series of IOMMU faults could be generated, which
I described above. (I.e., such IOMMU faults could result, at least in
the case of SEV, in garbage being written to disk, via queued commands.)
[Jiewen] You are right. That would not happen.
IOMMU driver should not bring any compatibility problem for the PCI driver,
iff the PCI driver follows the UEFI specification.


Agreed.

Now, to finish up, here's an idea I just had.

Are we allowed to call gBS->SignalEvent() in an ExitBootServices()
notification function?

If we are, then we could do the following:

* PciIo.Unmap() and friends would work as always (no restrictions on
    dynamic memory allocation or freeing, for any kind of DMA operation).

* PCI drivers would not be expected to call PciIo.Unmap() in their
    ExitBootServices() handlers.

* The IOMMU driver would have an ExitBootServices() notification
    function, to be enqueued at the TPL_CALLBACK or TPL_NOTIFY level
    (doesn't matter which).

* In this notification function, the IOMMU driver would signal *another*
    event (a private one). The notification function for this event would
    be enqueued strictly at the TPL_CALLBACK level.

* The notification function for the second event (private to the IOMMU)
    would iterate over all existent mappings, and unmap them, without
    allocating or freeing memory.

The key point is that by signaling the second event, the IOMMU driver
could order its own cleanup code after all other ExitBootServices()
callbacks. (Assuming, at least, that no other ExitBootServices()
callback employs the same trick to defer itself!) Therefore by the time
the second callback runs, all PCI devices have been halted, and it is
safe to tear down the translations.

The ordering would be ensured by the following:

- The UEFI spec says under CreateEventEx(), "All events are guaranteed
    to be signaled before the first notification action is taken." This
    means that, by the time the IOMMU's ExitBootServices() handler is
    entered, all other ExitBootServices() handlers have been *queued* at
    least, at TPL_CALLBACK or TPL_NOTIFY.

- Therefore, when we signal our second callback from there, for
    TPL_CALLBACK, the second callback will be queued at the end of the
    TPL_CALLBACK queue.

- CreateEventEx() also says that EFI_EVENT_GROUP_EXIT_BOOT_SERVICES "is
    functionally equivalent to the EVT_SIGNAL_EXIT_BOOT_SERVICES flag
    for the Type argument of CreateEvent." So it wouldn't matter if a
    driver used CreateEvent() or CreateEventEx() for setting up its own
    handler, the handler would be queued just the same.

I think this is ugly and brittle, but perhaps the only way to clean up
*all* translations safely, with regard to PciIo.Unmap() +
ExitBootServices(), without updating the UEFI spec.

What do you think?

[Jiewen] If the order is correct, and all PCI device driver is halt at 
ExitBootServices, that works.
We do not need update PCI driver and we do not need update UEFI spec.
We only need update IOMMU driver which is concerned, based upon the threat 
model.
That probably is best solution. :-)


I'm very glad to hear that you like the idea.

However, it depends on whether we are permitted, by the UEFI spec, to
signal another event in an ExitBootServices() notification function.

Are we permitted to do that?

Does the UEFI spec guarantee that the notification function for the
*second* event will be queued just like it would be under "normal"
circumstances?

(I know we must not allocate or free memory in the notification function
of the *second* event either; I just want to know if the second event's
handler is *queued* like it would normally be.)

I assume you want to handle both common buffer and read/write buffer, right?


Yes. Under this idea, all kinds of operations would be cleaned up.

And if possible, I still have interest to get clear on the threat model for SEV 
in OVMF.
If you or any SEV owner can share ...


Absolutely. Please see above.

Thank you!
Laszlo

_______________________________________________
edk2-devel mailing list
[email protected]
https://lists.01.org/mailman/listinfo/edk2-devel

_______________________________________________
edk2-devel mailing list
[email protected]
https://lists.01.org/mailman/listinfo/edk2-devel

Re: [edk2] [PATCH 0/4] MdeModulePkg: some PCI HC drivers: unmap common buffers at ExitBootServices()

Reply via email to