http://lwn.net/Articles/64234/

Improving kill_fasync()

Unix systems, and their variants, provide a number of ways for processes to manage multiple I/O streams simultaneously. One of those is through the use of I/O signals; a process can request to receive a SIGIO whenever a given file descriptor becomes available for reading or writing. Inside the kernel, this signalling is handled via a file-specific fasync_struct structure and a couple of helper functions. One of them, called fasync_helper(), simply helps the kernel (filesystem or driver) code track which processes have requested notification for a given file. The other, kill_fasync(), is invoked to actually deliver a signal to interested processes when the time comes.

The kernel uses a single reader/writer spinlock (fasync_lock) to serialize all calls to either helper function. In some situations, it would seem that this lock is starting to hurt performance. It seems that more types of devices support I/O signalling than was once the case, and the increasing number of calls to kill_fasync() is creating lock contention. So Manfred Spraul did something about it, in the form of a patch which switches the I/O signalling code over to the read-copy-update mechanism for mutual exclusion. The result for his particular test load was an 80% reduction in the time required to send out I/O signals.

Linus, having issues with how some of the locking was done, didn't much like the patch, But he also had some ideas for reworking the whole I/O signal mechanism to get rid of a lot of unneeded code. The key is in the understanding that the list of processes wanting I/O signals is very similar to the list of processes simply waiting for the I/O itself. Either way, it is a list of processes that needs to be notified when data becomes available or the file descriptor becomes writable. There is not a whole lot of difference between sending a SIGIO to the process and simply waking it up.

During the 2.5 development process, the wait queue mechanism was generalized somewhat; this Driver Porting Series article describes some of the changes which were made. The kernel function wake_up() (with several variants) is called to wake processes which are waiting on a wait queue; in 2.4 and prior kernels, it performed that wakeup directly. In 2.5, however, all wake_up() really does is call a special wakeup function, a pointer to which is stored in the wait queue entry. This indirection allows different processes to be awakened in different ways.

So far, there are few cases where a non-default wakeup function is used. But there is no real reason why, with a suitable wakeup function, wait queues could not be used for any of a number of different process signalling tasks. The whole I/O signalling mechanism and its fasync_struct structure could really be replaced by a wait queue with a special wakeup function.

The only problem with this nice, elegant idea is that it won't work. kill_fasync() takes a "band" argument which eventually gets passed though to the target process as signal data. There is currently no way to pass that information to a wakeup function via wake_up(). Adding a data parameter to wake_up() would fix that problem and, perhaps, enable a number of other potential uses for wait queues. Such a change appears likely to happen - but not until 2.7. Such changes really shouldn't be made in 2.6, now that the 2.6.0 kernel has come out.


Reply via email to