This patch series adds Page Modification Logging (PML) support in VMX.

1) Introduction

PML is a new feature on Intel's Boardwell server platfrom targeted to reduce
overhead of dirty logging mechanism.

The specification can be found at:

http://www.intel.com/content/www/us/en/processors/page-modification-logging-vmm-white-paper.html

Currently, dirty logging is done by write protection, which write protects guest
memory, and mark dirty GFN to dirty_bitmap in subsequent write fault. This works
fine, except with overhead of additional write fault for logging each dirty GFN.
The overhead can be large if the write operations from geust is intensive.

PML is a hardware-assisted efficient way for dirty logging. PML logs dirty GPA
automatically to a 4K PML memory buffer when CPU changes EPT table's D-bit from
0 to 1. To do this, A new 4K PML buffer base address, and a PML index were added
to VMCS. Initially PML index is set to 512 (8 bytes for each GPA), and CPU
decreases PML index after logging one GPA, and eventually a PML buffer full
VMEXIT happens when PML buffer is fully logged.

With PML, we don't have to use write protection so the intensive write fault EPT
violation can be avoided, with an additional PML buffer full VMEXIT for 512
dirty GPAs. Theoretically, this can reduce hypervisor overhead when guest is in
dirty logging mode, and therefore more CPU cycles can be allocated to guest, so
it's expected benchmarks in guest will have better performance comparing to
non-PML.

2) Design

a. Enable/Disable PML

PML is per-vcpu (per-VMCS), while EPT table can be shared by vcpus, so we need
to enable/disable PML for all vcpus of guest. A dedicated 4K page will be
allocated for each vcpu when PML is enabled for that vcpu.

Currently, we choose to always enable PML for guest, which means we enables PML
when creating VCPU, and never disable it during guest's life time. This avoids
the complicated logic to enable PML by demand when guest is running. And to
eliminate potential unnecessary GPA logging in non-dirty logging mode, we set
D-bit manually for the slots with dirty logging disabled.

b. Flush PML buffer

When userspace querys dirty_bitmap, it's possible that there are GPAs logged in
vcpu's PML buffer, but as PML buffer is not full, so no VMEXIT happens. In this
case, we'd better to manually flush PML buffer for all vcpus and update the
dirty GPAs to dirty_bitmap.

We do PML buffer flush at the beginning of each VMEXIT, this makes dirty_bitmap
more updated, and also makes logic of flushing PML buffer for all vcpus easier
-- we only need to kick all vcpus out of guest and PML buffer for each vcpu will
be flushed automatically.

3) Tests and benchmark results

I tested specjbb benchmark, which is memory intensive to measure PML. All tests
are done in below configuration:

Machine (Boardwell server): 16 CPUs (1.4G) + 4G memory
Host Kernel: KVM queue branch. Transparent Hugepage disabled. C-state, P-state,
        S-state disabled. Swap disabled.

Guest: Ubuntu 14.04 with kernel 3.13.0-36-generic
Guest: 4 vcpus + 1G memory. All vcpus are pinned.

a. Comapre score with and without PML enabled.

This is to make sure PML won't bring any performance regression as it's always
enabled for guest.

Booting guest with graphic window (no --nographic)

        NOPML           PML

        109755          109379
        108786          109300
        109234          109663
        109257          107471
        108514          108904
        109740          107623

avg:    109214          108723

performance regression: (109214 - 108723) / 109214 = 0.45%

Booting guest without graphic window (--nographic)

        NOPML           PML

        109090          109686
        109461          110533
        110523          108550
        109960          110775
        109090          109802
        110787          109192

avg:    109818          109756

performance regression: (109818 - 109756) / 109818 = 0.06%

So there's no noticeable performance regression leaving PML always enabled.

b. Compare specjbb score between PML and Write Protection.

This is used to see how much performance gain PML can bring when guest is in
dirty logging mode.

I modified qemu by adding an additional "Monitoring thread" to query
dirty_bitmap periodically (once per 1 second). With this thread, we can get
performance gain of PML by comparing specjbb score under PML code path and
write protection code path.

Again, I got score for both with/without graphic window of guest.

Booting guest with graphic window (no --nographic)

                PML             WP              No monitoring thread

                104748          101358
                102934          99895
                103525          98832
                105331          100678
                106038          99476
                104776          99851

        avg:    104558          100015          108723 (== PML score in test a)

        percent: 96.17%         91.99%          100%

        performance gain:       96.17% - 91.99% = 4.18%

Booting guest without graphic window (--nographic)

                PML             WP              No monithring thread
                
                104778          98967
                104856          99380
                103783          99406
                105210          100638
                106218          99763
                105475          99287
        
        avg:    105053          99573           109756 (== PML score in test a)

        percent: 95.72%         90.72%          100%

        performance gain:  95.72% - 90.72% = 5%

So there's noticeable performance gain (around 4%~5%) of PML comparing to Write
Protection.


Kai Huang (6):
  KVM: Rename kvm_arch_mmu_write_protect_pt_masked to be more generic
    for log dirty
  KVM: MMU: Add mmu help functions to support PML
  KVM: MMU: Explicitly set D-bit for writable spte.
  KVM: x86: Change parameter of kvm_mmu_slot_remove_write_access
  KVM: x86: Add new dirty logging kvm_x86_ops for PML
  KVM: VMX: Add PML support in VMX

 arch/arm/kvm/mmu.c              |  18 ++-
 arch/x86/include/asm/kvm_host.h |  37 +++++-
 arch/x86/include/asm/vmx.h      |   4 +
 arch/x86/include/uapi/asm/vmx.h |   1 +
 arch/x86/kvm/mmu.c              | 243 +++++++++++++++++++++++++++++++++++++++-
 arch/x86/kvm/trace.h            |  18 +++
 arch/x86/kvm/vmx.c              | 195 +++++++++++++++++++++++++++++++-
 arch/x86/kvm/x86.c              |  78 +++++++++++--
 include/linux/kvm_host.h        |   2 +-
 virt/kvm/kvm_main.c             |   2 +-
 10 files changed, 577 insertions(+), 21 deletions(-)

-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to