On Fri Nov 10, 2023 at 7:32 PM UTC, Sean Christopherson wrote: > On Fri, Nov 10, 2023, Nicolas Saenz Julienne wrote: > > On Wed Nov 8, 2023 at 6:33 PM UTC, Sean Christopherson wrote: > > > - What is the split between userspace and KVM? How did you arrive at > > > that split? > > > > Our original design, which we discussed in the KVM forum 2023 [1] and is > > public [2], implemented most of VSM in-kernel. Notably we introduced VTL > > awareness in struct kvm_vcpu. > > ... > > > So we decided to move all this complexity outside of struct kvm_vcpu > > and, as much as possible, out of the kernel. We figures out the basic > > kernel building blocks that are absolutely necessary, and let user-space > > deal with the rest. > > Sorry, I should have been more clear. What's the split in terms of > responsibilities, > i.e. what will KVM's ABI look like? E.g. if the vCPU=>VTLs setup is > nonsensical, > does KVM care? > > My general preference is for KVM to be as permissive as possible, i.e. let > userspace do whatever it wants so long as it doesn't place undue burden on > KVM. > But at the same time I don't to end up in a similar boat as many of the > paravirt > features, where things just stop working if userspace or the guest makes a > goof.
I'll make sure to formalize this for whenever I post a full series, I need to go over every hcall and think from that perspective. There are some rules it might make sense to enforce. But it really depends on the abstractions we settle on. KVM might not have the necessary introspection to enforce them. IMO ideally it wouldn't, VTLs should remain a user-space concept. My approach so far has been trusting QEMU is doing the right thing. Some high level examples come to mind: - Only one VTL vCPU might run at all times. - Privileged VTL interrupts have precedence over lower VTL execution. - lAPICs can only access their VTL. (Cross VTL IPIs happen through the PV interface). - Lower VTL state should be up to date when accessed from privileged VTLs (through the GET/SET_VP_REGSITER hcall). > > > - Why not make VTLs a first-party concept in KVM? E.g. rather than bury > > > info > > > in a VTL device and APIC ID groups, why not modify "struct kvm" to > > > support > > > replicating state that needs to be tracked per-VTL? Because of how > > > memory > > > attributes affect hugepages, duplicating *memslots* might actually be > > > easier > > > than teaching memslots to be VTL-aware. > > > > I do agree that we need to introduce some level VTL awareness into > > memslots. There's the hugepages issues you pointed out. But it'll be > > also necessary once we look at how to implement overlay pages that are > > per-VTL. (A topic I didn't mention in the series as I though I had > > managed to solve memory protections while avoiding the need for multiple > > slots). What I have in mind is introducing a memory slot address space > > per-VTL, similar to how we do things with SMM. > > Noooooooo (I hate memslot address spaces :-) ) > > Why not represent each VTL with a separate "struct kvm" instance? That would > naturally provide per-VTL behavior for: > > - APIC groups > - memslot overlays > - memory attributes (and their impact on hugepages) > - MMU pages Very interesting idea! I'll spend some time researching it, it sure solves a lot of issues. > The only (obvious) issue with that approach would be cross-VTL operations. > IIUC, > sending IPIs across VTLs isn't allowed, but even if it were, that should be > easy > enough to solve, e.g. KVM already supports posting interrupts from non-KVM > sources. Correct. Only through kvm_hv_send_ipi(), but from experience it happens very rarely, so performance shouldn't be critical. > GVA=>GPA translation would be trickier, but that patch suggests you want to > handle > that in userspace anyways. And if translation is a rare/slow path, maybe it > could > simply be punted to userspace? > > NOTE: The hypercall implementation is incomplete and only shared for > completion. Additionally we'd like to move the VTL aware parts to > user-space. > > Ewww, and looking at what it would take to support cross-VM translations shows > another problem with using vCPUs to model VTLs. Nothing prevents userspace > from > running a virtual CPU at multiple VTLs concurrently, which means that anything > that uses kvm_hv_get_vtl_vcpu() is unsafe, e.g. walk_mmu->gva_to_gpa() could > be > modified while kvm_hv_xlate_va_walk() is running. > > I suppose that's not too hard to solve, e.g. mutex_trylock() and bail if > something > holds the other kvm_vcpu/VTL's mutex. Though ideally, KVM would punt all > cross-VTL > operations to userspace. :-) > > If punting to userspace isn't feasible, using a struct kvm per VTL probably > wouldn't > make the locking and concurrency problems meaningfully easier or harder to > solve. > E.g. KVM could require VTLs, i.e. "struct kvm" instances that are part of a > single > virtual machine, to belong to the same process. That'd avoid headaches with > mm_struct, at which point I don't _think_ getting and using a kvm_vcpu from a > different kvm would need special handling? I'll look into it. > Heh, another fun one, the VTL handling in kvm_hv_send_ipi() is wildly broken, > the > in_vtl field is consumed before send_ipi is read from userspace. ugh, that's a tired last minute "cleanup" that went south... It's been working as intended for a while otherwise. I'll implement a kvm-unit-test to redeem myself. :) Nicolas