On Fri, Jun 26, 2026, David Woodhouse wrote:
> On Thu, 2026-06-25 at 16:09 -0700, Sean Christopherson wrote:
> > > diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
> > > index 91fd3673c09a..c16b4560c9e7 100644
> > > --- a/arch/x86/kvm/xen.c
> > > +++ b/arch/x86/kvm/xen.c
> > > @@ -907,6 +907,13 @@ int kvm_xen_vcpu_set_attr(struct kvm_vcpu *vcpu, 
> > > struct kvm_xen_vcpu_attr *data)
> > >  {
> > >   int idx, r = -ENOENT;
> > >  
> > > + /*
> > > +  * kvm_xen_write_hypercall_page() manages its own locking.
> > > +  * Handle it before taking xen_lock to avoid a deadlock.
> > 
> > Do we actually want the side effects that necessitate taking xen.xen_lock?  
> > From
> > a uAPI perspective, it's odd to effectively bundle 
> > KVM_XEN_ATTR_TYPE_LONG_MODE
> > into KVM_XEN_VCPU_ATTR_TYPE_WRITE_HYPERCALL_PAGE.
> 
> That's *guest* ABI, and it's derived from Xen behaviour. Xen will
> 'latch' its idea of whether a guest VM is 32-bit or 64-bit, for the
> purpose of shared data structures (shared_info page, vcpu_info,
> runstate).
> 
> Xen latches this from the current mode of the running vCPU in *two*
> places:
>  • When the hypercall MSR is invoked
>  • When the guest sets the event channel GSI (HVM_PARAM_CALLBACK_IRQ).
> 
> Thus far, the former has been handled in the kernel (in the code you're
> looking at), while the latter is why we have the ioctl to explicitly
> latch the guest's long_mode from userspace too, as userspace handles
> the HVMOP_set_param calls.

Right, and I'm pointing out that from a KVM uAPI perspective, bundling the first
one in a "write hypercall page" call is rather odd, especially since there's
already uAPI to handle the latching.

> > The other question is, why does kvm_xen_write_hypercall_page() drop xen_lock
> > when writing guest memory?  That seems odd and unnecessary.
> 
> Huh? It takes the lock to do the thing that needs the lock, then drops
> it. That is not "odd and unnecessary" at all.
>
> You've been spending too long with these scope-guarded locks.

No, I'm asking why KVM doesn't serialize the writes to guest memory.  Usually
when KVM writes to guest memory, KVM is emulating something that is very much
vCPU-specific, and so if there are races it's the guest's problem to deal with.

The Xen MSR here is clearly VM-scoped though, which is why it feels odd to take
a per-VM lock, and then deliberately drop the lock before completing the 
operation,
In practice it shouldn't matter, since it sounds like the same repeating 16 byte
pattern will be written every time, but it was a bit head-scratching when 
reading
the code.

> > > + if (data->type == KVM_XEN_VCPU_ATTR_TYPE_WRITE_HYPERCALL_PAGE)
> > > +         return kvm_xen_write_hypercall_page(vcpu, data->u.gpa) ? -EIO : 
> > > 0;
> > 
> > -EIO is rather weird, wouldn't -EINVAL be more appropriate?  Ah, and both 
> > are
> > wrong if copying the blob fails.
> 
> -EINVAL is more for "you asked me to do something that doesn't make sense".
> -EIO is for "something went wrong when I tried".

Sure, but KVM returns EINVAL for pretty much every ioctl (or ioctl-like thing)
if userspace provides bad input, e.g. for the @data param.
 
> Arguably, the thing that's most likely to go wrong is the
> kvm_vcpu_write_guest() where it writes instructions[] to the guest, and
> maybe that ought to be -EFAULT?

Heh, ya, I just say that too when looking at the code again.

> But I'm not sure that's quite the right semantic to return from the ioctl?

We can/should return whatever kvm_vcpu_write_guest() returns, i.e. literally
return its result directly.  Which of course is only ever going to be -EFAULT,
but in the extremely unlikely case that ever changes, we won't have to worry
about creating misleading behavior in the Xen code.

> > >   mutex_lock(&vcpu->kvm->arch.xen.xen_lock);
> > >   idx = srcu_read_lock(&vcpu->kvm->srcu);
> > 
> > Speaking of writing memory, kvm_xen_write_hypercall_page() expects the 
> > caller
> > to be in a read-side SRCU critical section (I didn't actually run this with
> > PROVE_LOCKING=y, but I don't think I'm missing anything?)
> 
> Yes, good catch. Thanks.
> 
> > So, if this uAPI is unavoidable seems like we want something like the below.
> > Either that or guard all of kvm_xen_write_hypercall_page() with a lock, and 
> > put
> > the entire thing in a helper so that 
> > KVM_XEN_VCPU_ATTR_TYPE_WRITE_HYPERCALL_PAGE
> > can be handled in a case-statement and doesn't need to grab SRCU on its own.
> 
> Makes sense (with the test, of course). Want me to put them together
> and resend?

Yes please.

Reply via email to