[PATCH 0/6] KVM: VMX: Page Modification Logging (PML) support

Kai Huang Tue, 27 Jan 2015 19:03:26 -0800

This patch series adds Page Modification Logging (PML) support in VMX.

1) Introduction

PML is a new feature on Intel's Boardwell server platfrom targeted to reduce
overhead of dirty logging mechanism.

The specification can be found at:

http://www.intel.com/content/www/us/en/processors/page-modification-logging-vmm-white-paper.html

Currently, dirty logging is done by write protection, which write protects guest
memory, and mark dirty GFN to dirty_bitmap in subsequent write fault. This works
fine, except with overhead of additional write fault for logging each dirty GFN.
The overhead can be large if the write operations from geust is intensive.

PML is a hardware-assisted efficient way for dirty logging. PML logs dirty GPA
automatically to a 4K PML memory buffer when CPU changes EPT table's D-bit from
0 to 1. To do this, A new 4K PML buffer base address, and a PML index were added
to VMCS. Initially PML index is set to 512 (8 bytes for each GPA), and CPU
decreases PML index after logging one GPA, and eventually a PML buffer full
VMEXIT happens when PML buffer is fully logged.

With PML, we don't have to use write protection so the intensive write fault EPT
violation can be avoided, with an additional PML buffer full VMEXIT for 512
dirty GPAs. Theoretically, this can reduce hypervisor overhead when guest is in
dirty logging mode, and therefore more CPU cycles can be allocated to guest, so
it's expected benchmarks in guest will have better performance comparing to
non-PML.

2) Design

a. Enable/Disable PML

PML is per-vcpu (per-VMCS), while EPT table can be shared by vcpus, so we need
to enable/disable PML for all vcpus of guest. A dedicated 4K page will be
allocated for each vcpu when PML is enabled for that vcpu.

Currently, we choose to always enable PML for guest, which means we enables PML
when creating VCPU, and never disable it during guest's life time. This avoids
the complicated logic to enable PML by demand when guest is running. And to
eliminate potential unnecessary GPA logging in non-dirty logging mode, we set
D-bit manually for the slots with dirty logging disabled.

b. Flush PML buffer

When userspace querys dirty_bitmap, it's possible that there are GPAs logged in
vcpu's PML buffer, but as PML buffer is not full, so no VMEXIT happens. In this
case, we'd better to manually flush PML buffer for all vcpus and update the
dirty GPAs to dirty_bitmap.

We do PML buffer flush at the beginning of each VMEXIT, this makes dirty_bitmap
more updated, and also makes logic of flushing PML buffer for all vcpus easier
-- we only need to kick all vcpus out of guest and PML buffer for each vcpu will
be flushed automatically.

3) Tests and benchmark results

I tested specjbb benchmark, which is memory intensive to measure PML. All tests
are done in below configuration:

Machine (Boardwell server): 16 CPUs (1.4G) + 4G memory
Host Kernel: KVM queue branch. Transparent Hugepage disabled. C-state, P-state,
S-state disabled. Swap disabled.

Guest: Ubuntu 14.04 with kernel 3.13.0-36-generic
Guest: 4 vcpus + 1G memory. All vcpus are pinned.

a. Comapre score with and without PML enabled.

This is to make sure PML won't bring any performance regression as it's always
enabled for guest.

Booting guest with graphic window (no --nographic)

NOPML PML

109755 109379
108786 109300
109234 109663
109257 107471
108514 108904
109740 107623

avg: 109214 108723

performance regression: (109214 - 108723) / 109214 = 0.45%

Booting guest without graphic window (--nographic)

NOPML PML

109090 109686
109461 110533
110523 108550
109960 110775
109090 109802
110787 109192

avg: 109818 109756

performance regression: (109818 - 109756) / 109818 = 0.06%

So there's no noticeable performance regression leaving PML always enabled.

b. Compare specjbb score between PML and Write Protection.

This is used to see how much performance gain PML can bring when guest is in
dirty logging mode.

I modified qemu by adding an additional "Monitoring thread" to query
dirty_bitmap periodically (once per 1 second). With this thread, we can get
performance gain of PML by comparing specjbb score under PML code path and
write protection code path.

Again, I got score for both with/without graphic window of guest.

Booting guest with graphic window (no --nographic)

PML WP No monitoring thread

104748 101358
102934 99895
103525 98832
105331 100678
106038 99476
104776 99851

avg: 104558 100015 108723 (== PML score in test a)

percent: 96.17% 91.99% 100%

performance gain: 96.17% - 91.99% = 4.18%

Booting guest without graphic window (--nographic)

PML WP No monithring thread

104778 98967
104856 99380
103783 99406
105210 100638
106218 99763
105475 99287

avg: 105053 99573 109756 (== PML score in test a)

percent: 95.72% 90.72% 100%

performance gain: 95.72% - 90.72% = 5%

So there's noticeable performance gain (around 4%~5%) of PML comparing to Write
Protection.

Kai Huang (6):
KVM: Rename kvm_arch_mmu_write_protect_pt_masked to be more generic
for log dirty
KVM: MMU: Add mmu help functions to support PML
KVM: MMU: Explicitly set D-bit for writable spte.
KVM: x86: Change parameter of kvm_mmu_slot_remove_write_access
KVM: x86: Add new dirty logging kvm_x86_ops for PML
KVM: VMX: Add PML support in VMX

--
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html

[PATCH 0/6] KVM: VMX: Page Modification Logging (PML) support

Reply via email to