The continuing development of I/O scheduling

The 2.5 development series has seen a great deal of work aimed at improving the performance of the block I/O subsystem. Recently there has been a resurgence of interest in I/O scheduling - deciding which disk I/O requests to process in which order. Optimal scheduling can keep the disks running at full speed and users happy, but the optimal solution can be hard to find. That doesn't stop the kernel hackers from trying, however. The anticipatory I/O scheduler work was covered here a couple of weeks ago; now a new approach is being tried which may improve I/O performance even more.

The technique being looked at is "stochastic fair queueing," and it is intended to bring greater fairness to I/O scheduling decisions. In a fair situation, all users of a particular drive would be able to execute about the same number of I/O requests over a given time. This approach to fairness gets rid of starvation problems, and ensures that all processes can get some work done. The hope would be, for example, that a streaming media application would be able to move its data without outages, even in the presence of other, disk-intensive applications.

The stochastic fair queueing approach was first developed in the networking world by Paul E. McKenney; his paper on the subject can be found on this page. In the networking context, stochastic fair queueing tries to divide the available bandwidth equally among all users. Ideally, a separate queue would be used for each ongoing connection, but high-performance routers lack the resources to do things that way. So a smaller number of queues is used, with each connection being assigned to a queue via a hash function. Packets are then taken from each queue in turn, dividing the bandwidth between them. If two high-bandwidth connections happen to land on the same queue, they will be penalized relative to the other queues; to address this problem, the hash function is periodically changed to redistribute connections among the queues. The algorithm works reasonably well and is easy to make fast; the Linux networking code has had a stochastic queueing module available for some time.

In the disk I/O context, the aim is to divide the available disk bandwidth fairly between processes. The initial implementation by Jens Axboe creates 64 subqueues for each block I/O request queue, and distributes requests among the subqueues based on the process ID of the requestor. (Actually, it uses the process ID of the currently running process, which could, in some situations, not be the originator of the request). When the time comes to dispatch requests, one is taken from each subqueue, and the whole set is ordered before being sent to the drive for execution.

Taking things even further, Jens has also posted a complete fair queueing scheduler, which does away with the hash function used in the stochastic approach. Each process has its own queue, and requests are taken equally from all queues. It is hard to get fairer than that. Of course, as Jens points out, once you have this infrastructure in place, it is relatively easy to make things less fair again by adding, say, I/O priorities to processes.

Where this all appears to be heading (though probably not in the 2.5 series) is toward a configurable I/O scheduler with several possible algorithms which can be mixed and matched according to a site's local policy. In other words, it looks a lot like the traffic control code which has existed in the networking subsystem for a few years. As with networking, most sites will probably not need to tweak their disk scheduling regimes. Users with special needs, however, will be glad for the ability to fine-tune things to their specifications.

Comments (8 posted)

[linuxkernelnewbies] The continuing development of I/O scheduling

Reply via email to