On Tue, Jun 19, 2012 at 11:27 PM, Christoffer Dall
<[email protected]> wrote:
> On Tue, Jun 19, 2012 at 5:16 AM, Avi Kivity <[email protected]> wrote:
>> On 06/19/2012 01:05 AM, Christoffer Dall wrote:
>>>> Premature, but this is sad. I suggest you split vmid generation from
>>>> next available vmid. This allows you to make the generation counter
>>>> atomic so it may be read outside the lock.
>>>>
>>>> You can do
>>>>
>>>> if (likely(kvm->arch.vmd_gen) == atomic_read(&kvm_vmid_gen)))
>>>> return;
>>>>
>>>> spin_lock(...
>>>>
>>>
>>> I knew you were going to say something here :), please take a look at
>>> this and see if you agree:
>>
>> It looks reasonable wrt locking.
>>
>>> +
>>> + /* First user of a new VMID generation? */
>>> + if (unlikely(kvm_next_vmid == 0)) {
>>> + atomic64_inc(&kvm_vmid_gen);
>>> + kvm_next_vmid = 1;
>>> +
>>> + /* This does nothing on UP */
>>> + smp_call_function(reset_vm_context, NULL, 1);
>>
>> Does this imply a memory barrier? If not, smp_mb__after_atomic_inc().
>>
>
> yes, it implies a memory barrier.
>
>>> +
>>> + /*
>>> + * On SMP we know no other CPUs can use this CPU's or
>>> + * each other's VMID since the kvm_vmid_lock blocks
>>> + * them from reentry to the guest.
>>> + */
>>> +
>>> + reset_vm_context(NULL);
>>
>> These two lines can be folded as on_each_cpu().
>>
>> Don't you have a race here where you can increment the generation just
>> before guest entry?
>
> I don't think I do.
>
uh, strike that, I do.
>>
>> cpu0 cpu1
>> (vmid=0, gen=1) (gen=0)
>> -------------------------- ----------------------
>> gen == global_gen, return
>>
>> gen != global_gen
>> increment global_gen
>> smp_call_function
>> reset_vm_context
>> vmid=0
>>
>> enter with vmid=0 enter with vmid=0
>
> I can't see how the scenario above can happen. First, no-one can run
> with vmid 0 - it is reserved for the host. If we bump global_gen on
> cpuN, then since we do smp_call_function(x, x, wait=1) we are now sure
> that after this call, all cpus(N-1) potentially being inside a VM will
> have exited, called reset_vm_context, but before they can re-enter
> into the guest, they will call update_vttbr, and if their generation
> counter differs from global_gen, they will try to grab that spinlock
> and everything should happen in order.
>
the whole vmid=0 confused me a bit. The point is since we moved the
generation check outside the spin_lock we have to re-check, I see:
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 19fe3b0..74760af 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -313,6 +313,24 @@ static void reset_vm_context(void *info)
}
/**
+ * check_new_vmid_gen - check that the VMID is still valid
+ * @kvm: The VM's VMID to checkt
+ *
+ * return true if there is a new generation of VMIDs being used
+ *
+ * The hardware supports only 256 values with the value zero reserved for the
+ * host, so we check if an assigned value belongs to a previous generation,
+ * which which requires us to assign a new value. If we're the first to use a
+ * VMID for the new generation, we must flush necessary caches and TLBs on all
+ * CPUs.
+ */
+static bool check_new_vmid_gen(struct kvm *kvm)
+{
+ if (likely(kvm->arch.vmid_gen == atomic64_read(&kvm_vmid_gen)))
+ return;
+}
+
+/**
* update_vttbr - Update the VTTBR with a valid VMID before the guest runs
* @kvm The guest that we are about to run
*
@@ -324,15 +342,7 @@ static void update_vttbr(struct kvm *kvm)
{
phys_addr_t pgd_phys;
- /*
- * Check that the VMID is still valid.
- * (The hardware supports only 256 values with the value zero
- * reserved for the host, so we check if an assigned value belongs to
- * a previous generation, which which requires us to assign a new
- * value. If we're the first to use a VMID for the new generation,
- * we must flush necessary caches and TLBs on all CPUs.)
- */
- if (likely(kvm->arch.vmid_gen == atomic64_read(&kvm_vmid_gen)))
+ if (!check_new_vmid_gen(kvm))
return;
spin_lock(&kvm_vmid_lock);
@@ -504,6 +514,13 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu
*vcpu, struct kvm_run *run)
*/
preempt_disable();
local_irq_disable();
+
+ if (check_new_vmid_gen(kvm)) {
+ local_irq_enable();
+ preempt_enable();
+ continue;
+ }
+
kvm_guest_enter();
vcpu->mode = IN_GUEST_MODE;
>>
>> You must recheck gen after disabling interrupts to ensure global_gen
>> didn't bump after update_vttbr but before entry. x86 has a lot of this,
>> see vcpu_enter_guest() and vcpu->requests (doesn't apply directly to
>> your case but may come in useful later).
>>
>>>
>>>>> +
>>>>> +/**
>>>>> + * kvm_arch_vcpu_ioctl_run - the main VCPU run function to execute guest
>>>>> code
>>>>> + * @vcpu: The VCPU pointer
>>>>> + * @run: The kvm_run structure pointer used for userspace state
>>>>> exchange
>>>>> + *
>>>>> + * This function is called through the VCPU_RUN ioctl called from user
>>>>> space. It
>>>>> + * will execute VM code in a loop until the time slice for the process
>>>>> is used
>>>>> + * or some emulation is needed from user space in which case the
>>>>> function will
>>>>> + * return with return value 0 and with the kvm_run structure filled in
>>>>> with the
>>>>> + * required data for the requested emulation.
>>>>> + */
>>>>> int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>>>> {
>>>>> - return -EINVAL;
>>>>> + int ret = 0;
>>>>> + sigset_t sigsaved;
>>>>> +
>>>>> + if (vcpu->sigset_active)
>>>>> + sigprocmask(SIG_SETMASK, &vcpu->sigset, &sigsaved);
>>>>> +
>>>>> + run->exit_reason = KVM_EXIT_UNKNOWN;
>>>>> + while (run->exit_reason == KVM_EXIT_UNKNOWN) {
>>>>
>>>> It's not a good idea to read stuff from run unless it's part of the ABI,
>>>> since userspace can play with it while you're reading it. It's harmless
>>>> here but in other places it can lead to a vulnerability.
>>>>
>>>
>>> ok, so in this case, userspace can 'suppress' an MMIO or interrupt
>>> window operation.
>>>
>>> I can change to keep some local variable to maintain the state and
>>> have some API convention for emulation functions, if you feel strongly
>>> about it, but otherwise it feels to me like the code will be more
>>> complicated. Do you have other ideas?
>>
>> x86 uses:
>>
>> 0 - return to userspace (run prepared)
>> 1 - return to guest (run untouched)
>> -ESOMETHING - return to userspace
>>
>
> yeah, we can do that I guess, that's fair.
>
>> as return values from handlers and for locals (usually named 'r').
>>
>>
>> --
>> error compiling committee.c: too many arguments to function
>>
>>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html