Here's some more text for the virtualization paper.
It talks about virtualizing pages in the guest address space.

Jens and others, you were interested in virtualizing
the page tables.  I'd be interested in hearing any
feedback on this stuff.

-Kevin
Virtualizing code/data pages:
=============================

This section talks about virtualizing one page worth of
code or data in the guest OS space, by mapping it into the
monitor space.

To help visualize things, let's assume we start out in
the following state.  We have allocated space
for, and setup all of the necessary monitor system structures;
for example, the TSS, page directory, page tables, GDT, IDT, etc.
To be clear, these structures belong to and are used by the monitor to
implement an environment where the guest can run, but are
not accessed directly by the guest OS.  The pages used by these
structures are marked with supervisor privilege, so any access
to them by the guest (all rings are pushed down to ring3)
will generate a page fault.

All of the monitor structures are mapped into an address
space spanned by one page table (a 4MByte span).
This is done for convenience so we can migrate the such structures
within the address space efficiently and easily, if the guest ever
requires the use of a linear address within that range.
Likely we have placed the monitor in a portion of the address space
in which the guest doesn't use.

We have also allocated memory for the guest's physical memory,
though for now let's say we have not yet mapped any of it into
the monitor's address space.

The guest is to begin execution at a given address in
it's linear address space.  As the guest begins executing at this
address which is not yet mapped into the monitor, the monitor will
receive a page fault.  This is given that we mark all unused address
space with an entry in the page tables such that a page fault occurs.

The monitor uses the page fault opportunity to map the needed
page into memory, at the actual linear address expected by the guest
but at the physical address of the page of memory allocated for
the guest by the host.  This mapping takes place in the monitor's
page tables, which are the ones really used by the CPU.  The
guest page tables are used only for reference, so the monitor
knows which physical guest page to map to.  We could continue
this process, mapping in new pages on demand, only when we encounter
guest execution in new pages.

As the guest executes, it will emit many data accesses to
various memory pages.  As these pages have not yet been mapped
into the monitor's memory space (we are starting with a blank
slate), they will generate page faults, in much the same way as
did accesses to code pages above.  So, we can map these data pages
into the monitor's space on demand, as we encounter their use in
the guest.

To recap, other than some pages which hold the monitor's
data structures, we have started out with a blank address
space from the point of view of the guest OS, and dynamically
created a page table, as the guest executes.  There are a couple
points to make here.  First, is that we have to rebuild the
page table this way upon every implicit or explicit change to the
PDBR (CR3) register.  (perhaps there is some room to optimize
here, but for now...)  Second is that we don't necessarily
have to build the page tables, one page at a time.  We could
map in bigger chunks, or whole address spaces at one time,
depending upon other considerations.


Virtualizing guest system data structures:
==========================================

Previously, we talked about how to virtualize code/data
pages, mapping them into the monitor's address space.
Now let's look at how to virtualize important guest OS
data structures such as the GDT, IDT, page tables etc.
It's important to keep in mind, that the ones really
used by the CPU are the monitor structures, stored in
supervisor permission pages, and which are thus
inaccessible from the guest being at ring3.

As the guest makes a mode transition (for example into
protected mode), or attempts to change a value of a register
which points to these structures (for example the LGDT instruction)
the monitor will receive an exception, since these instructions
are all protected from being executed in ring3.  (We can
also virtualize arbitrary instructions with the SBE logic)
The monitor uses the exception as a chance to emulate the
offending instruction.  At this point, we can see where the
new value in the register points.  By examing the data
at that address in the guest address space, we can build
'virtualized' values in the corresponding monitor structures.
As we know the size of a given data structure, we can also
determine the range of guest address space occupied, and thus
the pages which are spanned.  Since the monitor needs to be
aware of any read or write accesses to such regions to virtualize
the guest GDT, IDT, etc, it must mark these pages as inaccessible
by the guest running at ring3.

The monitor will then receive a page fault at any time that
the guest attempts an access to a protected region.  The fault
handler in the monitor will have to carry out the access on
behalf of the guest, and then update it's corresponding
entries in whatever structure was modified, knowing the affected
addresses.


Virtualization a real page fault in the guest:
==============================================

As page faults are a real part of normal OS protection mechanisms
and part of a paging strategy, they will occur normally in
an OS.  Since our virtualization strategies rely heavily on the
paging protections, our page fault handler needs to discern
between valid page faults and ones generated for virtualization
purposes.  Fortunately, this is not difficult, since we have access
to the guest page tables and our monitor data.  We can simply
examine the guest page tables to determine if a page fault should
have been generated naturally by the guest.  If this is the case,
we have to effect (emulate) a page fault for the guest.


Virtualization of non-existent physical guest memory:
=====================================================

For situations where the guest OS gropes physical memory
to determine the amount of memory installed, we must handle
this in such a way that the guest determines that memory
beyond the amount which we have allocated, does not exist.

One approach would be to make sure that all such accesses
are covered by protections in the page tables, and result
in a page fault.  We would then have to virtualize the
access on behalf of the guest, and continue.  For writes,
we would ignore the access, for reads we would return a
value reprentative of non-existent memory.

The problem with this approach is that it involves heavy
overhead to handle the execution of such guest code.

A different and more efficient approach would be to find a
truly unused physical memory region (spanning 1 aligned page)
in the host, and then map this page-size region into the address
space wherever the guest page tables point to non-existent physical
memory.  Or for guest code running in non-paged mode, map this page
to all of the linear address space above the size of physical
guest memory.  Then we could let the guest access such non-existent
memory from thereon, and it will truly be accessing non-existent
memory.  Any comments/caveats/warnings here?

Reply via email to