Hi Christoffer, On Wed, Feb 22, 2017 at 1:23 PM, Christoffer Dall <[email protected]> wrote:
> Hi Jintack, > > > On Mon, Jan 09, 2017 at 01:23:56AM -0500, Jintack Lim wrote: > > Nested virtualization is the ability to run a virtual machine inside > another > > virtual machine. In other words, it’s about running a hypervisor (the > guest > > hypervisor) on top of another hypervisor (the host hypervisor). > > > > This series supports nested virtualization on arm64. ARM recently > announced an > > extension (ARMv8.3) which has support for nested virtualization[1]. This > series > > is based on the ARMv8.3 specification. > > > > Supporting nested virtualization means that the hypervisor provides not > only > > EL0/EL1 execution environment with VMs as it usually does, but also the > > virtualization extensions including EL2 execution environment with the > VMs. > > Once the host hypervisor provides those execution environment with the > VMs, > > then the guest hypervisor can run its own VMs (nested VMs) naturally. > > > > To support nested virtualization on ARM the hypervisor must emulate a > virtual > > execution environment consisting of EL2, EL1, and EL0, as the guest > hypervisor > > will run in a virtual EL2 mode. Normally KVM/ARM only emulated a VM > supporting > > EL1/0 running in their respective native CPU modes, but with nested > > virtualization we deprivilege the guest hypervisor and emulate a virtual > EL2 > > execution mode in EL1 using the hardware features provided by ARMv8.3 to > trap > > EL2 operations to EL1. To do that the host hypervisor needs to manage EL2 > > register state for the guest hypervisor, and shadow EL1 register state > that > > reflects the EL2 register state to run the guest hypervisor in EL1. See > patch 6 > > through 10 for this. > > > > For memory virtualization, the biggest issue is that we now have more > than two > > stages of translation when running nested VMs. We choose to merge two > stage-2 > > page tables (one from the guest hypervisor and the other from the host > > hypervisor) and create shadow stage-2 page tables, which have mappings > from the > > nested VM’s physical addresses to the machine physical addresses. Stage-1 > > translation is done by the hardware as is done for the normal VMs. > > > > To provide VGIC support to the guest hypervisor, we emulate the GIC > > virtualization extensions using trap-and-emulate to a virtual GIC > Hypervisor > > Control Interface. Furthermore, we can still use the GIC VE hardware > features > > to deliver virtual interrupts to the nested VM, by directly mapping the > GIC > > VCPU interface to the nested VM and switching the content of the GIC > Hypervisor > > Control interface when alternating between a nested VM and a normal VM. > See > > patches 25 through 32, and 50 through 52 for more information. > > > > For timer virtualization, the guest hypervisor expects to have access to > the > > EL2 physical timer, the EL1 physical timer and the virtual timer. So, > the host > > hypervisor needs to provide all of them. The virtual timer is always > available > > to VMs. The physical timer is available to VMs via my previous patch > series[3]. > > The EL2 physical timer is not supported yet in this RFC. We plan to > support > > this as it is required to run other guest hypervisors such as Xen. > > > > Even though this work is not complete (see limitations below), I'd > appreciate > > early feedback on this RFC. Specifically, I'm interested in: > > - Is it better to have a kernel config or to make it configurable at > runtime? > > - I wonder if the data structure for memory management makes sense. > > - What architecture version do we support for the guest hypervisor, and > how? > > For example, do we always support all architecture versions or the same > > architecture as the underlying hardware platform? Or is it better > > to make it configurable from the userspace? > > - Initial comments on the overall design? > > > > This patch series is based on kvm-arm-for-4.9-rc7 with the patch series > to provide > > VMs with the EL1 physical timer[2]. > > > > Git: https://github.com/columbia/nesting-pub/tree/rfc-v1 > > > > Testing: > > We have tested this on ARMv8.0 (Applied Micro X-Gene)[3] since ARMv8.3 > hardware > > is not available yet. We have paravirtualized the guest hypervisor to > trap to > > EL2 as specified in ARMv8.3 specification using hvc instruction. We plan > to > > test this on ARMv8.3 model, and will post the result and v2 if necessary. > > > > Limitations: > > - This patch series only supports arm64, not arm. All the patches > compile on > > arm, but I haven't try to boot normal VMs on it. > > - The guest hypervisor with VHE (ARMv8.1) is not supported in this RFC. > I have > > patches for that, but they need to be cleaned up. > > - Recursive nesting (i.e. emulating ARMv8.3 in the VM) is not tested yet. > > - Other hypervisors (such as Xen) on KVM are not tested. > > > > TODO: > > - Test to boot normal VMs on arm architecture > > - Test this on ARMv8.3 model > > - Support the guest hypervisor with VHE > > - Provide the guest hypervisor with the EL2 physical timer > > - Run other hypervisors such as Xen on KVM > > > > I have a couple of overall questions and comments on this series: > First, I think we should make sure that the series actually works with > v8.3 on the model using both VHE and non-VHE for the host hypervisor. > I agree. Will send out v2 once I make this work with v8.3 model. > > Second, this patch set is pretty large overall and it would be great if > we could split it up into some slightly more manageable bits. I'm not > exactly how to do that, but perhaps we can rework it so that we add bits > of framework (CPU, memory, interrupt, timers) as individual series, and > finally we plug all the logic together with the current flow. What do > you think? > I think it sounds great. I can start with CPU patch series first. > > Third, we should follow the feedback from David about not using a kernel > config option. I'm afraid that some code will bitrot too fast if guided > by a kernel config option, so a runtime parameter and using static keys > where relevant seems like a better approach to me. But since KVM/ARM is > not loaded as a module, this would have to be a kernel cmdline > parameter. What do people think? > > Fourth, there are some places where we have hard-coded information (like > the location of the GICH/GICV interfaces) which have to be fixed by > adding the required userspace interfaces. > Right. I'll fix them and I'll provide a link which has userspace changes for this nesting work in the cover letter. > > Fifth, the ordering of the patches needs a bit of love. I think it's > important that we build the whole infrastructure first, but leave it > completely disabled until the end, and then we plug in all the > capabilities of userspace to create a nested VM in the end. So for > example, I would expect that patch 03 would be the last patch in the > series. > Ah, I got it. I'll reorder patches accordingly. > > Overall though, this is a massive amount of work, and it's awesome that > you were able to pull it together to a pretty nice initial RFC! > Thanks a lot for your help and reviews. I'll address individual reviews soon :) Thanks, Jintack > > Thanks! > -Christoffer > >
_______________________________________________ kvmarm mailing list [email protected] https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
