> From: Tian, Kevin
> Sent: Friday, May 20, 2011 4:05 PM
> 
> > From: Nadav Har'El
> > Sent: Tuesday, May 17, 2011 3:48 AM
> >
> > We saw in a previous patch that L1 controls its L2 guest with a vcms12.
> > L0 needs to create a real VMCS for running L2. We call that "vmcs02".
> > A later patch will contain the code, prepare_vmcs02(), for filling the 
> > vmcs02
> > fields. This patch only contains code for allocating vmcs02.
> >
> > In this version, prepare_vmcs02() sets *all* of vmcs02's fields each time we
> > enter from L1 to L2, so keeping just one vmcs02 for the vcpu is enough: It 
> > can
> > be reused even when L1 runs multiple L2 guests. However, in future versions
> > we'll probably want to add an optimization where vmcs02 fields that rarely
> > change will not be set each time. For that, we may want to keep around
> several
> > vmcs02s of L2 guests that have recently run, so that potentially we could 
> > run
> > these L2s again more quickly because less vmwrites to vmcs02 will be
> needed.
> 
> That would be a neat enhancement and should have an obvious improvement.
> Possibly we can maintain the vmcs02 pool along with L1 VMCLEAR ops, which
> is similar to the hardware behavior regarding to cleared and launched state.
> 
> >
> > This patch adds to each vcpu a vmcs02 pool, vmx->nested.vmcs02_pool,
> > which remembers the vmcs02s last used to run up to VMCS02_POOL_SIZE
> L2s.
> > As explained above, in the current version we choose VMCS02_POOL_SIZE=1,
> > I.e., one vmcs02 is allocated (and loaded onto the processor), and it is
> > reused to enter any L2 guest. In the future, when prepare_vmcs02() is
> > optimized not to set all fields every time, VMCS02_POOL_SIZE should be
> > increased.
> >
> > Signed-off-by: Nadav Har'El <[email protected]>
> > ---
> >  arch/x86/kvm/vmx.c |  139
> > +++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 139 insertions(+)
> >
> > --- .before/arch/x86/kvm/vmx.c      2011-05-16 22:36:47.000000000 +0300
> > +++ .after/arch/x86/kvm/vmx.c       2011-05-16 22:36:47.000000000 +0300
> > @@ -117,6 +117,7 @@ static int ple_window = KVM_VMX_DEFAULT_
> >  module_param(ple_window, int, S_IRUGO);
> >
> >  #define NR_AUTOLOAD_MSRS 1
> > +#define VMCS02_POOL_SIZE 1
> >
> >  struct vmcs {
> >     u32 revision_id;
> > @@ -166,6 +167,30 @@ struct __packed vmcs12 {
> >  #define VMCS12_SIZE 0x1000
> >
> >  /*
> > + * When we temporarily switch a vcpu's VMCS (e.g., stop using an L1's
> VMCS
> > + * while we use L2's VMCS), and we wish to save the previous VMCS, we
> must
> > also
> > + * remember on which CPU it was last loaded (vcpu->cpu), so when we
> return
> > to
> > + * using this VMCS we'll know if we're now running on a different CPU and
> > need
> > + * to clear the VMCS on the old CPU, and load it on the new one.
> Additionally,
> > + * we need to remember whether this VMCS was launched (vmx->launched),
> > so when
> > + * we return to it we know if to VMLAUNCH or to VMRESUME it (we cannot
> > deduce
> > + * this from other state, because it's possible that this VMCS had once 
> > been
> > + * launched, but has since been cleared after a CPU switch).
> > + */
> > +struct saved_vmcs {
> > +   struct vmcs *vmcs;
> > +   int cpu;
> > +   int launched;
> > +};
> 
> "saved" looks a bit misleading here. It's simply a list of all active vmcs02
> tracked
> by kvm, isn't it?
> 
> > +
> > +/* Used to remember the last vmcs02 used for some recently used vmcs12s
> > */
> > +struct vmcs02_list {
> > +   struct list_head list;
> > +   gpa_t vmcs12_addr;
> 
> uniform the name 'vmptr' as nested_vmx strucure:
>  /* The guest-physical address of the current VMCS L1 keeps for L2 */
>       gpa_t current_vmptr;
>       /* The host-usable pointer to the above */
>       struct page *current_vmcs12_page;
>       struct vmcs12 *current_vmcs12;
> 
> you should keep consistent meaning for vmcs12, which means the arch-neutral
> state interpreted by KVM only.
> 
> > +   struct saved_vmcs vmcs02;
> > +};
> > +
> > +/*
> >   * The nested_vmx structure is part of vcpu_vmx, and holds information we
> > need
> >   * for correct emulation of VMX (i.e., nested VMX) on this vcpu.
> >   */
> > @@ -178,6 +203,10 @@ struct nested_vmx {
> >     /* The host-usable pointer to the above */
> >     struct page *current_vmcs12_page;
> >     struct vmcs12 *current_vmcs12;
> > +
> > +   /* vmcs02_list cache of VMCSs recently used to run L2 guests */
> > +   struct list_head vmcs02_pool;
> > +   int vmcs02_num;
> >  };
> >
> >  struct vcpu_vmx {
> > @@ -4200,6 +4229,111 @@ static int handle_invalid_op(struct kvm_
> >  }
> >
> >  /*
> > + * To run an L2 guest, we need a vmcs02 based the L1-specified vmcs12.
> > + * We could reuse a single VMCS for all the L2 guests, but we also want the
> > + * option to allocate a separate vmcs02 for each separate loaded vmcs12 -
> > this
> > + * allows keeping them loaded on the processor, and in the future will 
> > allow
> > + * optimizations where prepare_vmcs02 doesn't need to set all the fields on
> > + * every entry if they never change.
> > + * So we keep, in vmx->nested.vmcs02_pool, a cache of size
> > VMCS02_POOL_SIZE
> > + * (>=0) with a vmcs02 for each recently loaded vmcs12s, most recent first.
> > + *
> > + * The following functions allocate and free a vmcs02 in this pool.
> > + */
> > +
> > +static void __nested_free_saved_vmcs(void *arg)
> > +{
> > +   struct saved_vmcs *saved_vmcs = arg;
> > +
> > +   vmcs_clear(saved_vmcs->vmcs);
> > +   if (per_cpu(current_vmcs, saved_vmcs->cpu) == saved_vmcs->vmcs)
> > +           per_cpu(current_vmcs, saved_vmcs->cpu) = NULL;
> > +}
> > +
> > +/*
> > + * Free a VMCS, but before that VMCLEAR it on the CPU where it was last
> > loaded
> > + * (the necessary information is in the saved_vmcs structure).
> > + * See also vcpu_clear() (with different parameters and side-effects)
> > + */
> > +static void nested_free_saved_vmcs(struct vcpu_vmx *vmx,
> > +           struct saved_vmcs *saved_vmcs)
> > +{
> > +   if (saved_vmcs->cpu != -1)
> > +           smp_call_function_single(saved_vmcs->cpu,
> > +                           __nested_free_saved_vmcs, saved_vmcs, 1);
> > +
> > +   free_vmcs(saved_vmcs->vmcs);
> > +}
> > +
> > +/* Free and remove from pool a vmcs02 saved for a vmcs12 (if there is one)
> */
> > +static void nested_free_vmcs02(struct vcpu_vmx *vmx, gpa_t vmptr)
> > +{
> > +   struct vmcs02_list *item;
> > +   list_for_each_entry(item, &vmx->nested.vmcs02_pool, list)
> > +           if (item->vmcs12_addr == vmptr) {
> > +                   nested_free_saved_vmcs(vmx, &item->vmcs02);
> > +                   list_del(&item->list);
> > +                   kfree(item);
> > +                   vmx->nested.vmcs02_num--;
> > +                   return;
> > +           }
> > +}
> > +
> > +/*
> > + * Free all VMCSs saved for this vcpu, except the actual vmx->vmcs.
> > + * These include the VMCSs in vmcs02_pool (except the one currently used,
> > + * if running L2), and saved_vmcs01 when running L2.
> > + */
> > +static void nested_free_all_saved_vmcss(struct vcpu_vmx *vmx)
> > +{
> > +   struct vmcs02_list *item, *n;
> > +   list_for_each_entry_safe(item, n, &vmx->nested.vmcs02_pool, list) {
> > +           if (vmx->vmcs != item->vmcs02.vmcs)
> > +                   nested_free_saved_vmcs(vmx, &item->vmcs02);
> > +           list_del(&item->list);
> > +           kfree(item);
> > +   }
> > +   vmx->nested.vmcs02_num = 0;
> > +}
> > +
> > +/* Get a vmcs02 for the current vmcs12. */
> > +static struct saved_vmcs *nested_get_current_vmcs02(struct vcpu_vmx
> > *vmx)
> > +{
> > +   struct vmcs02_list *item;
> > +   list_for_each_entry(item, &vmx->nested.vmcs02_pool, list)
> > +           if (item->vmcs12_addr == vmx->nested.current_vmptr) {
> > +                   list_move(&item->list, &vmx->nested.vmcs02_pool);
> > +                   return &item->vmcs02;
> > +           }
> > +
> > +   if (vmx->nested.vmcs02_num >= max(VMCS02_POOL_SIZE, 1)) {
> > +           /* Recycle the least recently used VMCS. */
> > +           item = list_entry(vmx->nested.vmcs02_pool.prev,
> > +                   struct vmcs02_list, list);
> > +           item->vmcs12_addr = vmx->nested.current_vmptr;
> > +           list_move(&item->list, &vmx->nested.vmcs02_pool);
> > +           return &item->vmcs02;

btw, shouldn't you clear recycled VMCS and reset 'cpu' and 'launched' fields?

Have you tried SMP L2 guest?

Thanks
Kevin

> > +   }
> > +
> > +   /* Create a new vmcs02 */
> > +   item = (struct vmcs02_list *)
> > +           kmalloc(sizeof(struct vmcs02_list), GFP_KERNEL);
> > +   if (!item)
> > +           return NULL;
> > +   item->vmcs02.vmcs = alloc_vmcs();
> > +   if (!item->vmcs02.vmcs) {
> > +           kfree(item);
> > +           return NULL;
> > +   }
> > +   item->vmcs12_addr = vmx->nested.current_vmptr;
> > +   item->vmcs02.cpu = -1;
> > +   item->vmcs02.launched = 0;
> > +   list_add(&(item->list), &(vmx->nested.vmcs02_pool));
> > +   vmx->nested.vmcs02_num++;
> > +   return &item->vmcs02;
> > +}
> > +
> > +/*
> >   * Emulate the VMXON instruction.
> >   * Currently, we just remember that VMX is active, and do not save or even
> >   * inspect the argument to VMXON (the so-called "VMXON pointer")
> because
> > we
> > @@ -4235,6 +4369,9 @@ static int handle_vmon(struct kvm_vcpu *
> >             return 1;
> >     }
> >
> > +   INIT_LIST_HEAD(&(vmx->nested.vmcs02_pool));
> > +   vmx->nested.vmcs02_num = 0;
> > +
> >     vmx->nested.vmxon = true;
> >
> >     skip_emulated_instruction(vcpu);
> > @@ -4286,6 +4423,8 @@ static void free_nested(struct vcpu_vmx
> >             vmx->nested.current_vmptr = -1ull;
> >             vmx->nested.current_vmcs12 = NULL;
> >     }
> > +
> > +   nested_free_all_saved_vmcss(vmx);
> >  }
> >
> >  /* Emulate the VMXOFF instruction */
> > --
> > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > the body of a message to [email protected]
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to [email protected]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to