On Thu, Jan 15, 2026 at 03:21:40PM -0800, Jim Mattson wrote: > When the vCPU is in guest mode with nested NPT enabled, guest accesses to > IA32_PAT are redirected to the gPAT register, which is stored in > vmcb02->save.g_pat. > > Non-guest accesses (e.g. from userspace) to IA32_PAT are always redirected > to hPAT, which is stored in vcpu->arch.pat. > > This is architected behavior. It also makes it possible to restore a new > checkpoint on an old kernel with reasonable semantics. After the restore, > gPAT will be lost, and L2 will run on L1's PAT. Note that the old kernel > would have always run L2 on L1's PAT.
This creates a difference in MSR_IA32_CR_PAT handling with nested SVM and nested VMX, right? AFAICT, reading MSR_IA32_CR_PAT while an L2 VMX guest is running will read L2's PAT. With this change, the same scenario on SVM will read L1's PAT. We can claim that it was always L1's PAT though, because we've always been running L2 with L1's PAT. I am just raising this in case it's a problem to have different behavior for SVM and VMX. I understand that we need to do this to be able to save/restore L1's PAT with SVM in guest mode and maintain backward compatibility. IIUC VMX does not have the same issue because host and guest PAT are both in vmcs12 and are both saved/restored appropriately. > > Signed-off-by: Jim Mattson <[email protected]> > --- > arch/x86/kvm/svm/svm.c | 31 ++++++++++++++++++++++++------- > 1 file changed, 24 insertions(+), 7 deletions(-) > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c > index 7041498a8091..3f8581adf0c1 100644 > --- a/arch/x86/kvm/svm/svm.c > +++ b/arch/x86/kvm/svm/svm.c > @@ -2846,6 +2846,13 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct > msr_data *msr_info) > case MSR_AMD64_DE_CFG: > msr_info->data = svm->msr_decfg; > break; > + case MSR_IA32_CR_PAT: > + if (!msr_info->host_initiated && is_guest_mode(vcpu) && > + nested_npt_enabled(svm)) > + msr_info->data = svm->vmcb->save.g_pat; /* gPAT */ > + else > + msr_info->data = vcpu->arch.pat; /* hPAT */ > + break; > default: > return kvm_get_msr_common(vcpu, msr_info); > } > @@ -2929,14 +2936,24 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct > msr_data *msr) > > break; > case MSR_IA32_CR_PAT: > - ret = kvm_set_msr_common(vcpu, msr); > - if (ret) > - break; > + if (!kvm_pat_valid(data)) > + return 1; > > - svm->vmcb01.ptr->save.g_pat = data; This is a bug fix, L2 is now able to alter L1's PAT, right? > - if (is_guest_mode(vcpu)) > - nested_vmcb02_compute_g_pat(svm); > - vmcb_mark_dirty(svm->vmcb, VMCB_NPT); This looks like another bug fix, it seems like we'll update vmcb01 but clear the clean bit in vmcb02 if we're in guest mode. Probably worth calling these out (and CC:stable, Fixes:.., etc)? We probably need a comment here explaining the gPAT vs hPAT case, I don't think it's super obvious why we only redirect L2's own accesses to its PAT but not userspace's. > + if (!msr->host_initiated && is_guest_mode(vcpu) && > + nested_npt_enabled(svm)) { > + svm->vmcb->save.g_pat = data; /* gPAT */ > + vmcb_mark_dirty(svm->vmcb, VMCB_NPT); > + } else { > + vcpu->arch.pat = data; /* hPAT */ Should we call kvm_set_msr_common() here instead of setting vcpu->arch.pat? The kvm_pat_valid() call would be redundant but that should be fine. My main concern is if kvm_set_msr_common() gains more logic for MSR_IA32_CR_PAT that isn't reflected here. Probably unlikely tho.. > + if (npt_enabled) { > + svm->vmcb01.ptr->save.g_pat = data; > + vmcb_mark_dirty(svm->vmcb01.ptr, VMCB_NPT); > + if (is_guest_mode(vcpu)) { IIUC (with the fix you mentioned) this is because L1 and L2 share the PAT if nested NPT is disabled, and if we're already in guest mode then we also need to update vmcb02 (as it was already created based on vmcb01 with the old PAT). Probably worth a comment. > + svm->vmcb->save.g_pat = data; > + vmcb_mark_dirty(svm->vmcb, VMCB_NPT); > + } > + } > + } > break; > case MSR_IA32_SPEC_CTRL: > if (!msr->host_initiated && > -- > 2.52.0.457.g6b5491de43-goog >

