Kernel Log: Morton questions acceptance of Xen Dom0 code; file systems for SSDs
In his response to the invitation on the Linux Kernel Mailing list (LKML) for comments on the recently submitted Xen Dom0 patches, Andrew Morton asks whether accepting these kernel extensions into the main Linux development tree to operate as the leading Xen domain (Dom0) still makes sense. He has suggested that Xen may be the "old" way to achieve virtualisation, whereas the world is moving in a "new" direction, towards KVM. He also suggests that Linux developers could regret accepting Xen Dom0 support in three years' time ("I hate to be the one to say it, but we should sit down and work out whether it is justifiable to merge any of this into Linux. I think it's still the case that the Xen technology is the "old" way and that the world is moving off in the "new" direction, KVM? In three years' time, will we regret having merged this? ").
This has prompted a debate on the pros and cons, and the relative advantages and drawbacks of Xen and KVM. Jeremy Fitzhardinge, a long-standing Xen developer who sent Xen Dom0 patches developed by him and others to the LKML, campaigned strongly for Xen, but was to some extent rebuffed by other well known Linux developers, including Nick Piggin and Ingo Molnar. As one of the managers of the kernel code for supporting the x86 architecture, Molnar could have an important say in the decision whether to accept Xen support. A decision has probably not been made yet, but in spite of the discussion stimulated by Morton and the objections of other kernel hackers, it's perfectly possible that the next-but-one Linux version (2.6.30) will incorporate Xen Dom0 code, based on these patches.
How the situation arose
In any case, it's difficult to make any predictions for the development model of the Linux kernel, because many developers, not least Linus Torvalds, can considerably speed up or hold back the acceptance of patches. Originally, Morton prophesied four years ago that acceptance of Xen support into the Linux kernel was imminent. At that time, however, kernel developers were already dissatisfied with some aspects of integrating it into the kernel sources, and asked for changes before it could be accepted.
While Xen developers were working on this, other Linux-specific virtualisation solutions appeared, such as KVM (Kernel-based Virtual Machine) and Lguest (originally called Lhype). The kernel developers are consequently pressing for an interface that lets the Linux kernel work as efficiently as possible as a paravirtualised guest under all of these and other virtualisation solutions, without large quantities of special code having to be included in the kernel for each hypervisor. The paravirt_ops abstraction layer then emerged, largely under the leadership of the Lguest developer, and found its way into the main Linux development tree with Linux 2.6.20.
That same version also saw the developers accept the KVM virtualisation framework into Linux. Though only a few months old at the time, it fitted into the kernel much better than Xen support and, in the opinion of many kernel hackers, was clearly the technically more elegant solution, since it used the kernel itself as hypervisor and thus had recourse to the infrastructure of the kernel (scheduler, memory management, drivers), while the Xen hypervisor is positioned upstream of the Linux kernel.
On the other hand, KVM requires CPUs with virtualisation functions, like the AMD-V and Intel VT. Xen can also use these functions in order to virtualise unmodified guest systems, but if the CPU doesn't provide these functions, an operating system adapted to Xen can alternatively run as a guest under the Xen hypervisor using paravirtualisation. Fitzhardinge is now citing this difference as one of the advantages of Xen, though all recent x86 server processors and many desktop and notebook CPUs now provide virtualisation functions.
While KVM underwent further constant and rapid speedy development within normal work on the kernel, acquiring functions like migration and PCI device pass-through for guests, Xen developers were slow to move ahead with integrating Xen into the Linux kernel. Instead, they paid a lot of attention to further development of the Xen code, which is also used in commercial Xen products. It sits on top of Linux kernel 2.6.18, and doesn't satisfy the quality requirements of kernel developers. The 2.6.18 kernel however lacks many drivers for more recent PC components, so the distribution developers are porting this Xen code to later kernels. This was and still is extremely laborious and, in practice, the result only works after a fashion. This is probably one of the reasons that motivated Red Hat to buy Qumranet, a company specialising in KVM, and subsequently (according to recently divulged plans) to use KVM in various Red Hat products.
In parallel with their further development of the "old" Xen code, Fitzhardinge and others worked on Xen patches that fit better into the kernel and make more use of paravirt_ops. First, they developed the code for Xen guest systems (DomU), which the kernel developers incorporated into Linux version 2.6.23. This code is active in the standard kernel in many distributions, so these now also work without any problem as guest kernels under a modern Xen hypervisor.
But the Xen hypervisor doesn't do a lot on its own: instead, it collaborates intensively with a Dom0 kernel. Previously, these two could only be set on top of the old Xen code base, which kernel developers reject for its lack of quality. The patches now published by Fitzhardinge are intended to eliminate this deficiency and permit operation as a Dom0 system. As he himself writes, this has involved rewriting large chunks of the Xen code. The present patches only lay the foundations for Dom0 support, and are far from providing the range of functions that the older Xen code base offers. Fitzhardinge is accordingly planning further series of patches. Some time is likely to go by before these are ready and the most egregious errors in the (still young or even non-existent) code have been eliminated – if indeed the kernel developers accept Dom0 support at all.
SSDs and file systems
Theodore Ts'o (also known as Ted Tso or Tytso) has blogged diligently in recent days on the subject of SSDs (solid state disks) and the Ext file systems in his care - Ext2, Ext3 and Ext4. He started off with "Should File Systems Be Optimised for SSDs?", going into detail on the mode of operation of SSDs and their wear-levelling functions. Tso's blog entry was inspired by Torvalds' recent discussion of that subject in a web forum. But Tso hasn't found a simple answer to the question of whether file systems have to be optimised for SSDs, because that depends on the SSD in use, as well as on the further development of the still young and constantly changing technology of SSD.
In a further blog entry he discusses in detail whether Ext2 is actually more suitable for SSDs than Ext3 or Ext4, as folk wisdom and some blog and forum entries claim. He has made many tests in the process, and also uses the support for using Ext4 without a journal that was substantially prepared by Google developers and has been accepted into Linux 2.6.29. His tests show the differences between the various file systems and the various mounting options such as "noatime" and "relatime", finding the effect on performance of "noatime" is much greater than the choice of file system.
Ted's blog entry, "Fast ext4 fsck times, revisited", contains much additional material for anyone wishing to investigate file systems further, including information about the fsck times tested with different code paths, in order to arrange data within the file system and/or on the data medium. Christoph Hellwig gave a survey in an email of changes affecting the XFS file system that have recently been implemented or are planned for the near future. And if that isn't enough, while the developers of the btrfs file system are working diligently on optimising its performance, Daniel Phillips continues to tinker with Tux3, keeping the public at large happy with snippets of information. For example, he recently put online a PDF version of a presentation he gave at SCALE 7x, explaining the mode of operation of the Tux3 file system.
The managers of the stable kernel series published many new stable kernels in the first three weeks of February, but in recent days things have slowed down a little, so versions 126.96.36.199 and 188.8.131.52, which were already mentioned in the previous Kernel Log and appeared shortly afterwards, are for the moment, up to date.
Development of version 2.6.29 has now reached rc7. On releasing it, Torvalds said an eighth pre-release version would probably be required, because the list of newly introduced errors (regressions) was still fairly long. It may therefore take at least around one-and-a-half to two weeks for Linux 2.6.29 to appear.
Further background and information about developments in the Linux kernel and its environment can also be found in previous issues of the kernel log at The H Open Source:
- [linuxkernelnewbies] Kernel Log: Morton questions acceptance of... Peter Teoh