On Fri Nov 7, 2025 at 3:54 PM UTC, Brendan Jackman wrote:
> On Wed Sep 24, 2025 at 3:10 PM UTC, Patrick Roy wrote:
>> From: Patrick Roy <[email protected]>
>>
>> [ based on kvm/next ]
>>
>> Unmapping virtual machine guest memory from the host kernel's direct map is a
>> successful mitigation against Spectre-style transient execution issues: If 
>> the
>> kernel page tables do not contain entries pointing to guest memory, then any
>> attempted speculative read through the direct map will necessarily be blocked
>> by the MMU before any observable microarchitectural side-effects happen. This
>> means that Spectre-gadgets and similar cannot be used to target virtual 
>> machine
>> memory. Roughly 60% of speculative execution issues fall into this category 
>> [1,
>> Table 1].
>>
>> This patch series extends guest_memfd with the ability to remove its memory
>> from the host kernel's direct map, to be able to attain the above protection
>> for KVM guests running inside guest_memfd.
>>
>> Additionally, a Firecracker branch with support for these VMs can be found on
>> GitHub [2].
>>
>> For more details, please refer to the v5 cover letter [v5]. No
>> substantial changes in design have taken place since.
>>
>> === Changes Since v6 ===
>>
>> - Drop patch for passing struct address_space to ->free_folio(), due to
>>   possible races with freeing of the address_space. (Hugh)
>> - Stop using PG_uptodate / gmem preparedness tracking to keep track of
>>   direct map state.  Instead, use the lowest bit of folio->private. (Mike, 
>> David)
>> - Do direct map removal when establishing mapping of gmem folio instead
>>   of at allocation time, due to impossibility of handling direct map
>>   removal errors in kvm_gmem_populate(). (Patrick)
>> - Do TLB flushes after direct map removal, and provide a module
>>   parameter to opt out from them, and a new patch to export
>>   flush_tlb_kernel_range() to KVM. (Will)
>>
>> [1]: https://download.vusec.net/papers/quarantine_raid23.pdf
>> [2]: 
>> https://github.com/firecracker-microvm/firecracker/tree/feature/secret-hiding
>
> I just got around to trying this out, I checked out this patchset using
> its base-commit and grabbed the Firecracker branch. Things seem OK until
> I set the secrets_free flag in the Firecracker config which IIUC makes
> it set GUEST_MEMFD_FLAG_NO_DIRECT_MAP.
>
> If I set it, I find the guest doesn't show anything on the console.
> Running it in a VM and attaching GDB suggests that it's entering the
> guest repeatedly, it doesn't seem like the vCPU thread is stuck or
> anything. I'm a bit clueless about how to debug that (so far, whenever
> I've broken KVM, things always exploded very dramatically).

I discovered that Firecracker has a GDB stub, so I can just attach to
that and see what the guest is up to.

The issue that the pvclock_vcpu_time_info in kvmclock is all zero:

(gdb) backtrace
#0  pvclock_tsc_khz (src=0xffffffff83a03000 <hv_clock_boot>) at 
../arch/x86/kernel/pvclock.c:28
#1  0xffffffff8109d137 in kvm_get_tsc_khz () at 
../arch/x86/include/asm/kvmclock.h:11
#2  0xffffffff835c1842 in kvm_get_preset_lpj () at 
../arch/x86/kernel/kvmclock.c:128
#3  kvmclock_init () at ../arch/x86/kernel/kvmclock.c:332
#4  0xffffffff835c1487 in kvm_init_platform () at ../arch/x86/kernel/kvm.c:982
#5  0xffffffff835a83df in setup_arch 
(cmdline_p=cmdline_p@entry=0xffffffff82e03f00) at ../arch/x86/kernel/setup.c:916
#6  0xffffffff83595a22 in start_kernel () at ../init/main.c:925
#7  0xffffffff835a7354 in x86_64_start_reservations (
    real_mode_data=real_mode_data@entry=0x36326c0 <error: Cannot access memory 
at address 0x36326c0>) at ../arch/x86/kernel/head64.c:507
#8  0xffffffff835a7466 in x86_64_start_kernel (real_mode_data=0x36326c0 <error: 
Cannot access memory at address 0x36326c0>)
    at ../arch/x86/kernel/head64.c:488
#9  0xffffffff8103e7fd in secondary_startup_64 () at 
../arch/x86/kernel/head_64.S:413
#10 0x0000000000000000 in ?? ()
(gdb) p *src
$3 = {version = 0, pad0 = 0, tsc_timestamp = 0, system_time = 0, 
tsc_to_system_mul = 0, tsc_shift = 0 '\000', flags = 0 '\000', 
  pad = "\000"}

This causes a divide by zero in kvm_get_tsc_khz().

Probably the only reason I didn't see any console output is that I
forgot to set earlyprintk, oops...


Reply via email to