Hi, The past months I've been working intermittently on an I/O scheduler framework and a fair queuing policy for DragonFly BSD. First off I want to note that I'm not a Computer Scientist and that my code is probably sub-optimal, completely off from what is considered an I/O scheduler in scientific papers, etc.
The FQ policy should serve mainly as a proof of concept of what the framework can do. It seems to work pretty decently on its own, as some rather naive benchmarks[1] I did confirm. I really want to emphasize that it's suboptimal, it has some problems that limit for example overall write performance more than it should. Yet overall it should solve our extreme interactivity issues. As the graphs[1] show, the read performance with ongoing writes has been drastically increased by about a factor of 3. -- At this point I would like to make my work public and see some testing and especially some reviews. You can either fetch my iosched-current branch on leaf or just apply my patch[2] to the current master as of this writing; although it probably also applies to older kernels. The work basically consists of 4 parts: - General system interfacing - I/O scheduler framework (dsched) - I/O scheduler fair queuing policy (dsched_fq or fq) - userland tools (dschedctl and ionice) -- After applying the patch you still won't notice any difference, as the default scheduler is the so-called noop (no operation) scheduler; it emulates our current behaviour. This can be confirmed by dschedctl -l: # dschedctl -l cd0 => noop da0 => noop acd0 => noop ad0 => noop fd0 => noop md0 => noop -- To enable the fq policy on a disk you have two options: 1) set scheduler_{diskname}="fq" in /boot/loader.conf; e.g. if it should be enabled for da0, then scheduler_da0="fq". Certain wildcards are also understood, e.g. scheduler_da* or scheduler_*. Note that sernos are not supported (yet). 2) use dschedctl: # dschedctl -s fq -d da0 Switched scheduler policy of da0 successfully to fq. After this, dschedctl -l should list the scheduler of da0 as fq. Another use of dschedctl is to list the available scheduling policies, which is of limited use right now, but I'll show it's use anyways: # dschedctl -p > noop > fq -- The ionice priority is similar to nice, but the levels nice values range from 0 to 10, and unlike the usual nice, 10 is the highest priority and 0 the lowest. Usage is exactly the same as nice: # ionice -n 10 sh read_big -- A brief description of the inner workings of the FQ policy follows: - all requests (bios) are let through by default without any queuing. - for each process/thread in the system, the average latency of its I/O and the tps is calculated. - A thread that runs every several 100 ms checks if the disk bandwidth is full, and if so, it allocates a fair share of maximum transactions to each process/thread in the system that is doing I/O. This is done taking into account the latency and the tps. Processes/threads exceeding their share get rate limited to a number of tps. - Processes/threads sharing a ioprio get each an equal amount of the pie. - Once a process/thread is rate limited, only the given amount of bios go through. All bios exceeding the fair share of the thread/process in the scheduling time quantum are queued in a per process/thread queue. Reads are queued at the front while writes are added to the back. - Before dispatching a bio for a process/thread, it is checked if its queue is non-empty. If this is the case, these bios are dispatched first before the new bio is dispatched. - A dispatcher thread runs every ~20ms dispatching bios for all processes/threads that have queued bios, up to the maximum number allowed. -- Please take the time to try and see how it works out for you. Some known issues are: a) after a few hot swaps of scheduling policies a panic may occur. Reason for this is yet unknown, but has to do with the cleanup of the fqps. b) only-write performance (with theoretically no other IO occurring) might be reduced. This is due to the fact that the heuristic that detects if the disk is full or not relies on the number of incomplete transactions, so even when the disk isn't really full but still many I/Os are in flight, the processes are rate limited, too. Let me know how it works out for you and suggestions on how to improve it. The next few weeks I'll have somewhat more limited time again due to exams coming up, but I'll try to be responsive. Cheers, Alex Hornung [1]: http://leaf.dragonflybsd.org/~alexh/iosched-new.html [2]: http://leaf.dragonflybsd.org/~alexh/iosched-current.diff