http://blogs.amd.com/developer/2010/03/29/amd-and-io-virtualization-on-magny-cours-processors/ Enhanced AMD-Vi in Opteron 6000 (a.k.a. “Magny-Cours”) Platform Architecture: Virtualization is the art of abstracting or hiding the “implementation” from the “experience”. Thus the notion of various system components – CPU, MMU, and devices – presented to the Guest Virtual Machines (VM) by the Virtual Machine Monitors (VMM, like Hypervisors) can be quite different from their real implementation in hardware. Thus a layer of software emulation is inevitable. CPU and MMU are more well behaved, and their abstractions self-contained, that emulation of these hardware components can be rendered in fairly efficient manner. However, emulation of I/O devices pose many dimensions of problems:
Thus it is evident that high performing devices take a hit in performance when its service is shared by multiple virtual devices. The first level of priorities in the design of I/O virtualization solutions is to eliminate the emulation bottleneck as best as possible – even at the cost of exclusively dedicating devices to specific VMs that make best use of those devices. Such simplification in the device usage models (exclusive ownership) permits guest operating systems to get unrestricted use of the device – they can configure their native device drivers to use these exclusive devices in their best performing configurations. However, this still does not remove from the VMM the responsibility to ensure secure isolation of guest VMs – thus they still need to ensure that malicious DMAs are thwarted, DMA are redirected to the correct memory locations, interrupts are redirected to the correct virtual/physical CPUs running the guest VMs. Software intercepts to provide these facilities are again going to incur costs comparable to software emulation per se – and hence we require hardware support to efficiently manage secure direct assignment of I/O devices. AMD-Vi (a.k.a. IOMMU) As mentioned above, in virtualization, device emulation is the most time-consuming component. The overhead comes from intercepting DMA accesses and interrupts for guest OS. To prevent malicious DMA accesses from un-authorized devices, the hypervisor has to intercept all DMA requests and then copy the data to (or from) guest memory area. Under the same reason, interrupt has to be received by hypervisor and then re-delivered. This causes tremendous overhead for devices with lots of DMA requests or interrupts. To solve these problems, a new hardware feature is added to AMD SR56x0 chipset. This feature, called AMD-Vi™ (standing for AMD I/O Virtualization), can be used to control DMA accesses and interrupts for devices installed on the system. AMD-Vi™ clearly has the following benefits: (1) security by isolation; (2) faster performance due to low overhead. There are various usage scenarios for AMD-Vi™. Other than being used for device passthru inside hypervisors, AMD-Vi™ can be used as an instrument for device isolation for security. For instance, we can use AMD-Vi™ to prevent malicious devices from accessing system memory by controlling their DMA accesses via AMD-Vi™. It is straightforward to configure AMD-Vi™ in hypervisors. The discovery of AMD-Vi™ feature is provided via an ACPI IVRS table (I/O Virtualization Reporting Structure), which describes the topology and configuration of AMD-Vi™. If AMD-Vi™ is found present, software can then setup AMD-Vi™ control registers, along with other software data structures such as I/O page table, event log, command buffer and device table. After that, AMD-Vi™ can be enabled to control DMA translation and interrupt remapping. It is worth noting that only hypervisor needs to be changed. Programmers do not need to change anything inside guest OS. In other words, the native device driver inside guest OS can be used directly. AMD-Vi™ was designed with many advanced features such as flexible I/O page tables and address skipping. With many other optimization techniques, AMD-Vi™ is extremely fast. In our testing, we found that device passthru is closed to native device for PCI-E devices with intensive I/O traffic (including 10G NICs, HBA storage controllers, or even graphics cards). This is a significant performance boost considering that virtualized devices used to be performance bottlenecks virtualized environment. Passthru of legacy devices has one major limitation: each device can only be assigned one guest VM. This is very inconvenient given that many guest VMs can be consolidated to a machine, which only has a limited number of PCI/PCI-E slots. SR-IOV is a new PCI-E feature to solve this problem. For devices supporting SR-IOV, a physical device can be configured to support multiple virtual functions, and each can be assigned to a separate guest VM. The hardware configuration and management all happen inside the PCI-E device, which minimizes the overhead. Another issue with IOMMU is the limitation on device migration. It is impossible to migrate the guest operating systems from one system to another when IOMMU is enabled. IOMMU has been supported in Xen, KVM, VMware ESX and native Linux kernel. Written by: Sreekumar R. Nair, Senior Member of Technical Staff at AMD and Wei Huang, Member of Technical Staff at AMD |