I/O scheduler (aka dsched)

Alex Hornung Mon, 29 Mar 2010 20:05:14 -0700

Hi,

The past months I've been working intermittently on an I/O scheduler
framework and a fair queuing policy for DragonFly BSD. First off I want to
note that I'm not a Computer Scientist and that my code is probably
sub-optimal, completely off from what is considered an I/O scheduler in
scientific papers, etc.


The FQ policy should serve mainly as a proof of concept of what the
framework can do. It seems to work pretty decently on its own, as some
rather naive benchmarks[1] I did confirm. I really want to emphasize that
it's suboptimal, it has some problems that limit for example overall write
performance more than it should. Yet overall it should solve our extreme
interactivity issues. As the graphs[1] show, the read performance with
ongoing writes has been drastically increased by about a factor of 3.

--

At this point I would like to make my work public and see some testing and
especially some reviews. You can either fetch my iosched-current branch on
leaf or just apply my patch[2] to the current master as of this writing;
although it probably also applies to older kernels.

The work basically consists of 4 parts:
- General system interfacing
- I/O scheduler framework (dsched)
- I/O scheduler fair queuing policy (dsched_fq or fq)
- userland tools (dschedctl and ionice)

--

After applying the patch you still won't notice any difference, as the
default scheduler is the so-called noop (no operation) scheduler; it
emulates our current behaviour. This can be confirmed by dschedctl -l:
# dschedctl -l
cd0     =>      noop
da0     =>      noop
acd0    =>      noop
ad0     =>      noop
fd0     =>      noop
md0     =>      noop

--

To enable the fq policy on a disk you have two options:

1) set scheduler_{diskname}="fq" in /boot/loader.conf; e.g. if it should be
enabled for da0, then scheduler_da0="fq". Certain wildcards are also
understood, e.g. scheduler_da* or scheduler_*. Note that sernos are not
supported (yet).

2) use dschedctl:
# dschedctl -s fq -d da0
Switched scheduler policy of da0 successfully to fq. After this, dschedctl
-l should list the scheduler of da0 as fq.

Another use of dschedctl is to list the available scheduling policies, which
is of limited use right now, but I'll show it's use anyways:
# dschedctl -p
        >       noop
        >       fq

--

The ionice priority is similar to nice, but the levels nice values range
from 0 to 10, and unlike the usual nice, 10 is the highest priority and 0
the lowest. Usage is exactly the same as nice:
# ionice -n 10 sh read_big


--

A brief description of the inner workings of the FQ policy follows:
- all requests (bios) are let through by default without any queuing.

- for each process/thread in the system, the average latency of its I/O and
the tps is calculated.

- A thread that runs every several 100 ms checks if the disk bandwidth is
full, and if so, it allocates a fair share of maximum transactions to each
process/thread in the system that is doing I/O. This is done taking into
account the latency and the tps. Processes/threads exceeding their share get
rate limited to a number of tps.

- Processes/threads sharing a ioprio get each an equal amount of the pie.

- Once a process/thread is rate limited, only the given amount of bios go
through. All bios exceeding the fair share of the thread/process in the
scheduling time quantum are queued in a per process/thread queue. Reads are
queued at the front while writes are added to the back.

- Before dispatching a bio for a process/thread, it is checked if its queue
is non-empty. If this is the case, these bios are dispatched first before
the new bio is dispatched.

- A dispatcher thread runs every ~20ms dispatching bios for all
processes/threads that have queued bios, up to the maximum number allowed.

--

Please take the time to try and see how it works out for you. Some known
issues are:
a) after a few hot swaps of scheduling policies a panic may occur. Reason
for this is yet unknown, but has to do with the cleanup of the fqps.

b) only-write performance (with theoretically no other IO occurring) might
be reduced. This is due to the fact that the heuristic that detects if the
disk is full or not relies on the number of incomplete transactions, so even
when the disk isn't really full but still many I/Os are in flight, the
processes are rate limited, too.

Let me know how it works out for you and suggestions on how to improve it.
The next few weeks I'll have somewhat more limited time again due to exams
coming up, but I'll try to be responsive.


Cheers,
Alex Hornung

[1]: http://leaf.dragonflybsd.org/~alexh/iosched-new.html
[2]: http://leaf.dragonflybsd.org/~alexh/iosched-current.diff

I/O scheduler (aka dsched)

Reply via email to