Muli Ben-Yehuda wrote:
On Wed, Jun 18, 2008 at 03:48:33PM -0500, Anthony Liguori wrote:

Right.  But this is not ideal.  Instead of pinning up-front, it
would make more sense IMHO to build the VT-d table as the shadow
page table gets faulted in.  In certain circumstances, this will
result in extraneous updates (because a GPA=>HPA mapping is already
present) and that's where we should eliminate iotlb flushes.

As Ben wrote, we can't do this and must fault everything in up-front
(assuming no PVDMA API). Assume we don't do this: it is valid for the
guest to program the device with a GPA that does not yet have a
corresponding HPA (because the guest did not write or read to/from it
and thus we haven't yet faulted in a frame for it). Then, once the
device DMA's to it, the DMA will be stopped incorrectly.

As I've said, the lack of PVDMA API is a special case. The key is to use the same internal infrastructure.

Obviously, pinning the entire guest is not desirable since we waste
a lot of memory resources, but this is the approach that we
currently have. Do you find it good enough for a merge with the
main KVM tree, and optimize later?
No, it's not safe.  What happens mmap(MAP_FIXED) into phys_ram_base?
We need to use MMU notifiers to handle such events and appropriately
flush the iotlb.

Could you elaborate on what you mean here and what is not safe? Our
current approach is to just fault in all of guest memory---are you
concerned about a case where some of the guest frames get replaced by
other frames because of the mmap()?

Because the guest is now accessing memory that is not guest memory. When mmu-notifiers forcefully change a mapping, we need to react to it.

I'd like to stress that we are shooting at the moment for the simplest
possible solution that is good enough, so that we'll be able to
finally merge this into the tree...

I don't think what I'm suggesting is more code than the current implementation and it fits more cleanly into KVM.

I'm not sure how we can do that... the guest can send a guest
physical address to the device for DMA, even without generating a
page-fault on the host for that address... which implies that the
host must pin the entire guest memory in advance. agree?
See above.  Ideally we would wait until the first PCI config space
access for a device before special casing the guest.  Otherwise,
there's no way to allow a DMA-aware guest to avoid pinning up front.

Err, if the user gave the guest pass-through access to a PCI device,
presumably it is because the guest will use it... What do we win by
delaying the inevitable?

s/DMA-aware/PVDMA-aware/

You do not know if a guest is PVDMA-aware until the guest tells you so. If you pin all of memory before the guest starts running, you may not have needed to allocate all of that memory. As we move to cooperative memory management between the host and guest, I expect the normal circumstance will be to launch a guest with far more memory than it needs relying on the fact that the guest will not touch that memory. Pinning memory unconditionally defeats this.

In terms of merging, I don't think it's going to be reasonable to merge for 2.6.27 so there's not much of an argument for not doing it correctly.

Regards,

Anthony Liguori

Cheers,
Muli

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to