Hi Christoffer,

On Wed, Feb 22, 2017 at 1:23 PM, Christoffer Dall <[email protected]> wrote:

> Hi Jintack,
>
>
> On Mon, Jan 09, 2017 at 01:23:56AM -0500, Jintack Lim wrote:
> > Nested virtualization is the ability to run a virtual machine inside
> another
> > virtual machine. In other words, it’s about running a hypervisor (the
> guest
> > hypervisor) on top of another hypervisor (the host hypervisor).
> >
> > This series supports nested virtualization on arm64. ARM recently
> announced an
> > extension (ARMv8.3) which has support for nested virtualization[1]. This
> series
> > is based on the ARMv8.3 specification.
> >
> > Supporting nested virtualization means that the hypervisor provides not
> only
> > EL0/EL1 execution environment with VMs as it usually does, but also the
> > virtualization extensions including EL2 execution environment with the
> VMs.
> > Once the host hypervisor provides those execution environment with the
> VMs,
> > then the guest hypervisor can run its own VMs (nested VMs) naturally.
> >
> > To support nested virtualization on ARM the hypervisor must emulate a
> virtual
> > execution environment consisting of EL2, EL1, and EL0, as the guest
> hypervisor
> > will run in a virtual EL2 mode.  Normally KVM/ARM only emulated a VM
> supporting
> > EL1/0 running in their respective native CPU modes, but with nested
> > virtualization we deprivilege the guest hypervisor and emulate a virtual
> EL2
> > execution mode in EL1 using the hardware features provided by ARMv8.3 to
> trap
> > EL2 operations to EL1. To do that the host hypervisor needs to manage EL2
> > register state for the guest hypervisor, and shadow EL1 register state
> that
> > reflects the EL2 register state to run the guest hypervisor in EL1. See
> patch 6
> > through 10 for this.
> >
> > For memory virtualization, the biggest issue is that we now have more
> than two
> > stages of translation when running nested VMs. We choose to merge two
> stage-2
> > page tables (one from the guest hypervisor and the other from the host
> > hypervisor) and create shadow stage-2 page tables, which have mappings
> from the
> > nested VM’s physical addresses to the machine physical addresses. Stage-1
> > translation is done by the hardware as is done for the normal VMs.
> >
> > To provide VGIC support to the guest hypervisor, we emulate the GIC
> > virtualization extensions using trap-and-emulate to a virtual GIC
> Hypervisor
> > Control Interface.  Furthermore, we can still use the GIC VE hardware
> features
> > to deliver virtual interrupts to the nested VM, by directly mapping the
> GIC
> > VCPU interface to the nested VM and switching the content of the GIC
> Hypervisor
> > Control interface when alternating between a nested VM and a normal VM.
> See
> > patches 25 through 32, and 50 through 52 for more information.
> >
> > For timer virtualization, the guest hypervisor expects to have access to
> the
> > EL2 physical timer, the EL1 physical timer and the virtual timer. So,
> the host
> > hypervisor needs to provide all of them. The virtual timer is always
> available
> > to VMs. The physical timer is available to VMs via my previous patch
> series[3].
> > The EL2 physical timer is not supported yet in this RFC. We plan to
> support
> > this as it is required to run other guest hypervisors such as Xen.
> >
> > Even though this work is not complete (see limitations below), I'd
> appreciate
> > early feedback on this RFC. Specifically, I'm interested in:
> > - Is it better to have a kernel config or to make it configurable at
> runtime?
> > - I wonder if the data structure for memory management makes sense.
> > - What architecture version do we support for the guest hypervisor, and
> how?
> >   For example, do we always support all architecture versions or the same
> >   architecture as the underlying hardware platform? Or is it better
> >   to make it configurable from the userspace?
> > - Initial comments on the overall design?
> >
> > This patch series is based on kvm-arm-for-4.9-rc7 with the patch series
> to provide
> > VMs with the EL1 physical timer[2].
> >
> > Git: https://github.com/columbia/nesting-pub/tree/rfc-v1
> >
> > Testing:
> > We have tested this on ARMv8.0 (Applied Micro X-Gene)[3] since ARMv8.3
> hardware
> > is not available yet. We have paravirtualized the guest hypervisor to
> trap to
> > EL2 as specified in ARMv8.3 specification using hvc instruction. We plan
> to
> > test this on ARMv8.3 model, and will post the result and v2 if necessary.
> >
> > Limitations:
> > - This patch series only supports arm64, not arm. All the patches
> compile on
> >   arm, but I haven't try to boot normal VMs on it.
> > - The guest hypervisor with VHE (ARMv8.1) is not supported in this RFC.
> I have
> >   patches for that, but they need to be cleaned up.
> > - Recursive nesting (i.e. emulating ARMv8.3 in the VM) is not tested yet.
> > - Other hypervisors (such as Xen) on KVM are not tested.
> >
> > TODO:
> > - Test to boot normal VMs on arm architecture
> > - Test this on ARMv8.3 model
> > - Support the guest hypervisor with VHE
> > - Provide the guest hypervisor with the EL2 physical timer
> > - Run other hypervisors such as Xen on KVM
> >
>
> I have a couple of overall questions and comments on this series:


> First, I think we should make sure that the series actually works with
> v8.3 on the model using both VHE and non-VHE for the host hypervisor.
>

I agree. Will send out v2 once I make this work with v8.3 model.


>
> Second, this patch set is pretty large overall and it would be great if
> we could split it up into some slightly more manageable bits.  I'm not
> exactly how to do that, but perhaps we can rework it so that we add bits
> of framework (CPU, memory, interrupt, timers) as individual series, and
> finally we plug all the logic together with the current flow.  What do
> you think?
>

I think it sounds great. I can start with CPU patch series first.


>
> Third, we should follow the feedback from David about not using a kernel
> config option.  I'm afraid that some code will bitrot too fast if guided
> by a kernel config option, so a runtime parameter and using static keys
> where relevant seems like a better approach to me.  But since KVM/ARM is
> not loaded as a module, this would have to be a kernel cmdline
> parameter.  What do people think?
>
> Fourth, there are some places where we have hard-coded information (like
> the location of the GICH/GICV interfaces) which have to be fixed by
> adding the required userspace interfaces.
>

Right. I'll fix them and I'll provide a link which has userspace changes
for this nesting work in the cover letter.


>
> Fifth, the ordering of the patches needs a bit of love. I think it's
> important that we build the whole infrastructure first, but leave it
> completely disabled until the end, and then we plug in all the
> capabilities of userspace to create a nested VM in the end.  So for
> example, I would expect that patch 03 would be the last patch in the
> series.
>

Ah, I got it. I'll reorder patches accordingly.


>
> Overall though, this is a massive amount of work, and it's awesome that
> you were able to pull it together to a pretty nice initial RFC!
>

Thanks a lot for your help and reviews. I'll address individual reviews
soon :)

Thanks,
Jintack


>
> Thanks!
> -Christoffer
>
>
_______________________________________________
kvmarm mailing list
[email protected]
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Reply via email to