http://blogs.amd.com/developer/2010/03/29/amd-and-io-virtualization-on-magny-cours-processors/

Enhanced AMD-Vi in Opteron 6000 (a.k.a. “Magny-Cours”) Platform Architecture:

Virtualization is the art of abstracting or hiding the “implementation” from the “experience”. Thus the notion of various system components – CPU, MMU, and devices – presented to the Guest Virtual Machines (VM) by the Virtual Machine Monitors (VMM, like Hypervisors) can be quite different from their real implementation in hardware. Thus a layer of software emulation is inevitable. CPU and MMU are more well behaved, and their abstractions self-contained, that emulation of these hardware components can be rendered in fairly efficient manner. However, emulation of I/O devices pose many dimensions of problems:

  1. The multitude of devices, and multitude of configurations thereof, makes the emulation layer in the VMMs extremely complex – VMMs are required to implement drivers for every device “under the sun” to be competitive in a market where the count of complexity of new device architectures keep leaping at an unbelievable pace.
  2. The synchronization of accesses to the same physical device from multiple virtual devices imposes severe overheads – every I/O operation has to filter through a level of software indirection – even in the case when a physical device is used exclusively or mostly exclusively by a single VM.
  3. The device drivers implemented in the VMMs overrides the native device drivers in the guest VMs. Thus any optimizations implemented by specific guest operating systems to improve efficiency of handling a specific device (say, a graphics accelerator) will be lost in the process of emulation – the user gets to settle for an experience which is the “least common denominator” supported by the VMMs.
  4. In addition to the above, the VMM is responsible for ensuring secure isolations of guest VMs which require many levels of detailed supervision by the VMMs:
    1. Prevent malicious Direct Memory Accesses (DMA) from corrupting the memory space of other VMs or the VMM per se.
    2. Ensuring that the DMA requests from I/O devices are redirected to the correct target physical memory addresses, and the interrupts raised by I/O devices are redirected to the correct virtual (and subsequently physical) CPUs running the guest VMs.

Thus it is evident that high performing devices take a hit in performance when its service is shared by multiple virtual devices. The first level of priorities in the design of I/O virtualization solutions is to eliminate the emulation bottleneck as best as possible – even at the cost of exclusively dedicating devices to specific VMs that make best use of those devices. Such simplification in the device usage models (exclusive ownership) permits guest operating systems to get unrestricted use of the device – they can configure their native device drivers to use these exclusive devices in their best performing configurations.

However, this still does not remove from the VMM the responsibility to ensure secure isolation of guest VMs – thus they still need to ensure that malicious DMAs are thwarted, DMA are redirected to the correct memory locations, interrupts are redirected to the correct virtual/physical CPUs running the guest VMs. Software intercepts to provide these facilities are again going to incur costs comparable to software emulation per se – and hence we require hardware support to efficiently manage secure direct assignment of I/O devices.

AMD-Vi (a.k.a. IOMMU)

As mentioned above, in virtualization, device emulation is the most time-consuming component. The overhead comes from intercepting DMA accesses and interrupts for guest OS. To prevent malicious DMA accesses from un-authorized devices, the hypervisor has to intercept all DMA requests and then copy the data to (or from) guest memory area. Under the same reason, interrupt has to be received by hypervisor and then re-delivered. This causes tremendous overhead for devices with lots of DMA requests or interrupts. To solve these problems, a new hardware feature is added to AMD SR56x0 chipset. This feature, called AMD-Vi™ (standing for AMD I/O Virtualization), can be used to control DMA accesses and interrupts for devices installed on the system.

AMD-Vi™ clearly has the following benefits: (1) security by isolation; (2) faster performance due to low overhead. There are various usage scenarios for AMD-Vi™. Other than being used for device passthru inside hypervisors, AMD-Vi™ can be used as an instrument for device isolation for security. For instance, we can use AMD-Vi™ to prevent malicious devices from accessing system memory by controlling their DMA accesses via AMD-Vi™.

It is straightforward to configure AMD-Vi™ in hypervisors. The discovery of AMD-Vi™ feature is provided via an ACPI IVRS table (I/O Virtualization Reporting Structure), which describes the topology and configuration of AMD-Vi™. If AMD-Vi™ is found present, software can then setup AMD-Vi™ control registers, along with other software data structures such as I/O page table, event log, command buffer and device table. After that, AMD-Vi™ can be enabled to control DMA translation and interrupt remapping. It is worth noting that only hypervisor needs to be changed. Programmers do not need to change anything inside guest OS. In other words, the native device driver inside guest OS can be used directly.

AMD-Vi™ was designed with many advanced features such as flexible I/O page tables and address skipping. With many other optimization techniques, AMD-Vi™ is extremely fast. In our testing, we found that device passthru is closed to native device for PCI-E devices with intensive I/O traffic (including 10G NICs, HBA storage controllers, or even graphics cards). This is a significant performance boost considering that virtualized devices used to be performance bottlenecks virtualized environment.

Passthru of legacy devices has one major limitation: each device can only be assigned one guest VM. This is very inconvenient given that many guest VMs can be consolidated to a machine, which only has a limited number of PCI/PCI-E slots. SR-IOV is a new PCI-E feature to solve this problem. For devices supporting SR-IOV, a physical device can be configured to support multiple virtual functions, and each can be assigned to a separate guest VM. The hardware configuration and management all happen inside the PCI-E device, which minimizes the overhead.

Another issue with IOMMU is the limitation on device migration. It is impossible to migrate the guest operating systems from one system to another when IOMMU is enabled.

IOMMU has been supported in Xen, KVM, VMware ESX and native Linux kernel.

Written by: Sreekumar R. Nair, Senior Member of Technical Staff at AMD and Wei Huang, Member of Technical Staff at AMD


Reply via email to