LCA: Catching up with X.org

[LWN subscriber-only content]

By Jonathan Corbet
January 23, 2009

For years, linux.conf.au has been one of the best places to go to catch up with the state of the X Window System; the 2009 event was no exception. There was a big difference this time around, though. X talks have typically been all about the great changes which are coming in the near future. This time, the X developers had a different story: most of those great changes are done and will soon be heading toward a distribution near you.

Keith Packard's talk started with that theme. When he spoke at LCA2008, there were a few missing features in X.org. Small things like composited three-dimensional graphics, monitor hotplugging, shared graphical objects, kernel-based mode setting, and kernel-based two-dimensional drawing. One of the main things holding all of that work back was the lack of a memory manager which could work with the graphics processor (GPU). It was, Keith said, much like programming everything in early Fortran; doing things with memory was painful.

That problem is history; X now has a kernel-based memory management system. It can be used to allocate persistent objects which are shared between the CPU and the GPU. Since graphical objects are persistent, applications no longer need to make backup copies of everything; these objects will not disappear. Objects have globally-visible names, which, among other things, allows them to be shared between applications. They can even be shared between different APIs, with objects being transformed between various types (image, texture, etc.) as needed. It looks, in fact, an awful lot like a filesystem; there may eventually be a virtual filesystem interface to these objects.

This memory manager is, of course, the graphics execution manager, or GEM. It is new code; the developers first started talking about the need to start over with a new memory manager in March, 2008. The first implementation was posted in April, and the code was merged for the 2.6.28 kernel, released in December. In the process, the GEM developers dropped a lot of generality; they essentially abandoned the task of supporting BSD systems, for example ("sorry about that," says Keith). They also limit support to some Intel hardware at this point. After seeing attempts at large, general solutions fail, the GEM developers decided to focus on getting one thing working, and to generalize thereafter. There is work in progress to get GEM working with ATI chipsets, but that project will not be done for a little while yet.

[PULL QUOTE: Moving data between caches is very expensive, so caching must be managed with great care. This is a task they had assumed would be hard. "Unfortunately," says Keith, "we were right." END QUOTE] GEM is built around the shmfs filesystem code; much of the fundamental object allocation is done there. That part is easy; the biggest hassle turns out to be in the area of cache management. Even on Intel hardware, which is alleged to be fully cache-coherent, there are caching issues which arise when dealing with the GPU. Moving data between caches is very expensive, so caching must be managed with great care. This is a task they had assumed would be hard. "Unfortunately," says Keith, "we were right."

One fundamental design feature of GEM is the use of global names for graphical objects. Unlike previous APIs, GEM does not deal with physical addresses of objects in its API. That allows the kernel to move things around as needed; as a result, every application can work with the assumption that it has access to the full GPU memory aperture. Graphical objects, in turn, are referenced by "batch buffers," which contain sequences of operations for the GPU. The batch buffer is the fundamental scheduling unit used by GEM; by allowing multiple applications to schedule batch buffers for execution, the GEM developers hope to be able to take advantage of the parallelism of the GPU.

GEM replaces the "balkanized" memory management found in earlier APIs. Persistent objects eliminate a number of annoyances, such as the dumping of textures at every task switch. What is also gone is the allocation of the entire memory aperture at startup time; memory is now allocated as needed. And lots of data copying has been taken out. All told, it is a much cleaner and better-performing solution than its predecessors.

Getting this code into the kernel was an classic example of working well with the community. The developers took pains to post their code early, then they listened to the comments which came back. In the process of responding to reviews, they were able to make some internal kernel API changes which made life easier. In general, they found, when you actively engage the kernel community, making changes is easy.

The next step was the new DRI2 X extension, intended to replace the (now legacy) DRI extension. It only has three requests, enabling connection to the hardware and buffer allocation. The DRI shared memory area (and its associated lock) have been removed, eliminating a whole class of problems. Buffer management is all done in the X server; that makes life a lot easier.

Then, there is the kernel mode-setting (KMS) API - the other big missing piece. KMS gets user-space applications out of the business of programming the adapter directly, putting the kernel in control. The KMS code (merged for 2.6.29) also implements the fbdev interface, meaning that graphics and the console now share the same driver. Among other things, that will let the kernel present a traceback when the system panics, even if X is running. Fast user switching is another nice feature which falls out of the KMS merge. KMS also eliminates the need for the X server to run with root privileges, which should help security-conscious Linux users sleep better at night. The X server is a huge body of code which, as a rule, has never been through a serious security audit. It's a lot better if that code can be run in an unprivileged mode.

Finally, KMS holds out the promise of someday supporting non-graphical uses of the GPU. See the GPGPU site for information on the kinds of things people try to do once they see the GPU as a more general-purpose coprocessor.

All is not yet perfect, naturally. Beyond its limited hardware support, the new code also does not yet solve the longstanding "tearing" problem. Tearing happens when an update is not coordinated with the monitor's vertical refresh, causing half-updated screens. It is hard to solve without stalling the GPU to wait for vertical refresh, an operation which kills performance. So the X developers are looking at ways to context-switch the GPU. Then buffer copies can be queued in the kernel and caused to happen after the vertical refresh interrupt. It's a somewhat hard problem, but, says Keith, it will be fixed soon.

There is reason to believe this promise. The X developers have managed to create and merge a great deal of code over the course of the last year. Keith's talk was a sort of a celebration; the multi-year process of bringing X out of years of stagnation and into the 21st century is coming to a close. That is certainly an achievement worth celebrating.

Postscript: Keith's talk concerned the video output aspect of the X Window System, but an output-only system is not particularly interesting. The other side of the equation - input - was addressed by Peter Hutterer in a separate session. Much of the talk was dedicated to describing the current state of affairs on the input side of X. Suffice to say that it is a complex collection of software modules which have been bolted on over the years; see the diagram in the background of the picture to the right.

What is more interesting is where things are going from here. A lot of work is being done in this area, though, according to Peter, only a couple of developers are doing it. Much of the classic configuration-file magic has been superseded by HAL-based autoconfiguration code. The complex sequence of events which follows the attachment of a keyboard is being simplified. Various limits - on the number of buttons on a device, for example - are being lifted. And, of course, the multi-pointer X work (discussed at LCA2008) is finding its way into the mainline X server and into distributions.

The problems in the input side of X have received less attention, but it is still an area which has been crying out for work for some time. Now that work, too, is heading toward completion. For users of X (and that is almost all of us), life is indeed getting better.

[linuxkernelnewbies] LCA: Catching up with X.org [LWN.net]

LCA: Catching up with X.org

[LWN subscriber-only content]

Reply via email to