> /...direct access to the I/O devices, which is addressed by VT-d./

For most people, this is not true. I'd guess that VT-d (aka IOMMU) is used by less than 1% of users. IOMMU is not used for disk, network, or video I/O.

For the common I/O devices, like hard disk, NIC, video card, and mouse, virtual machines use special software drivers to access hardware "directly" (meaning, with decent performance). For Linux KVM, these drivers are called "virtio". For VirtualBox, there is a menu item "Devices > Install Guest Additions" (or "VBoxLinuxAdditions-x86.run" under Linux). Under VMWare its in the menu "VM > Install VMware Tools".

But that has nothing to do with IOMMU/VT-d support. ("VT-d" is Intel's name for IOMMU support, and "AMD-Vi" is AMD's name for it.) IOMMU support is sometimes called "PCI passthrough", and it is how a VM can talk "directly" to PCI hardware (incl. the USB bus, so, USB devices too).

For IOMMU, the CPU, the BIOS, and also the motherboard must all support IOMMU. Having IOMMU support on the motherboard is not common for desktop hardware. Mostly just expensive rack-mount servers have IOMMU support (but check your mobo specs to find out). (Also, some older mobos need a BIOS flash update to make IOMMU work correctly, and some have IOMMU that is Just Plain Broken.)

When using "direct" I/O to a device using IOMMU, you must do special commands to tell the host OS to stop talking to that PCI address (because the guest OS will now talk to that address directly). The virsh "nodedev-detach" command is an example of this.

Only one guest OS (or else the host OS) can talk to an I/O device directly (using the IOMMU support), because all PCI hardware is designed to take commands from exactly one OS at a time. If multiple OSes tried talking to a PCI card at once, the hardware would freak out and (probably) lock up your computer. That is why the hard disk and NIC are not shared using IOMMU; instead, the VM hypervisor will share the NIC and disk I/O to multiple guests using the special drivers mentioned above.

And that is the reason that disk and network I/O is so slow under all virtual machines. That's why VMware's license won't let you publish performance numbers, because a vanilla VM (without performance tuning) will have disk I/O as slow as molasses in winter with simple tests like "dd". It's also why most VM farms use dedicated, ultra-high-speed disk hardware, like fiber channel SANs. KVM's virtio only gets about ~350Mbit on a GigE network card. Even worse, once you get beyond a handful of servers on a box, the disk I/O contention becomes a serious problem (and that is why vendors are working on solutions such as Linux's KSM, Kernel Samepage Merging, to reduce disk I/O contention).

Personally, I don't think IOMMU is all that useful. Virtualization is useful because one computer can run many guests. But IOMMU only allows one guest (or else the host) to talk to one piece of hardware. If you have installed a special PCI card (i.e., not disk or network), then there is very little advantage to building a VM to talk to it. In most cases it would be much simpler to just let the host OS talk to it directly, rather than building a VM to talk to it (because that VM would still have slow disk and network).

Lately I have stopped using VMs for servers. All my servers are Linux, so I am switching over to Linux containers (LXC). LXC are like FreeBSD's jails -- you can run multiple servers, but there is no virtualization. All guest "containers" share the same Linux kernel -- there is only one kernel ever running, even if you have multiple servers with multiple IP addresses. Everything -- CPU, disk I/O, NIC -- runs at 100% of full native speed. PCI devices that support it can be shared between multiple processes (on multiple containers), and all hardware access is as "direct" as it gets. With LXC, you can add lots of new containers and the disk paging is all handled by a single kernel, so there is no contention between multiple kernels. LXC requires no special hardware support -- not even VT or AMD-V in the processor -- and LXC can run at the same time as KVM (or other virtual machine software).

LXC is in the mainline Linux kernel and now ships with Ubuntu. The biggest drawback is that the administration tools aren't up to par (yet). I have written my own scripts for creating and cloning LXC containers, but there are no nifty GUI tools for LXC (such as virt-manager or the VMWare administration tools, although, support is on the roadmap for virt-manager). Given the massive performance and efficiency advantages, I predict that LXC will take over all VM farms where most of the servers are Linux.

But for desktop virtualization (running a different OS on your workstation), I still prefer VirtualBox... or whatever it will be called when it goes the way of MySQL (Drizzle) and OpenOffice (LibreOffice).


--Derek

On 03/05/2011 06:43 PM, Jeff Maxwell wrote:
On Fri, Mar 4, 2011 at 12:04 PM, Derek Simkowiak<der...@realloc.net>  wrote:

    There are no downsides I am aware of.  It just lets run VMs run much
faster than without it.

    I don't know why they made it a BIOS option, rather than 'always on'.


The processor reports the capability to the BIOS via the CPUID instruction
and it must be enabled prior to use by an OS/hypervisor. The BIOS is given
the responsibility of enabling it after checking if the processor actually
supports it. The on/off switch in the BIOS is just like all the other knobs
and settings the BIOS has control over.

If your processor supports EPT (exteneded page tables, AMD calls it NP or
nested paging or similar) I suggest enabling VT in the BIOS and EPT in
VirtualBox. This allows the virtual machine to handle its own paging without
the need for help from the hypervisor. This addresses one of the large
bottlenecks when running a virtual machine, the other being direct access to
the I/O devices, which is addressed by VT-d.

--
Jeff
Speaking for myself, not my employer.

Reply via email to