Hello,

I've been looking at increasing block I/O performance and
I have some proposals, but I would like your opinion and
feedback before opening almost ready pull requests.
I'll appreciate any feedback, advice and new ideas.

Proposals:
1) virtio-blk
2) libblock heavy buffering for sequential fs I/O

I will split them into separate mails.

1) virtio-blk:
The main performance culprit is in processing multiple blocks
in one-by-one fashion [1].

[1]: 
https://github.com/HelenOS/helenos/blob/master/uspace/drv/block/virtio-blk/virtio-blk.c#L271

I tested it with an old HDD and SSD I have in my server,
writing 20480 blocks (10MiB) at MAX IPC XFER size [2], so in
batches of 128 blocks: (MAX IPC XFER (64KiB) / DEV BSIZE (512)).
Setup: KVM-enabled Qemu with real block devices connected via
       -drive ...,if=virtio on Linux.

[2]: actual constant is called DATA_XFER_LIMIT in <abi/ipc/ipc.h>


1st try:
I threw my Fibril Group Executor (tm) :-) at it, but I was only
able to get 6x - 9x speedup out of it, plus I don't think 128
fibrils are healthy for the scheduler, even though they are
sleepy waiting for IRQ.


2nd try:
I upscaled the DMA request buffer allocation size to 64KiB per
request buffer (formerly only 512B per rq_buf) and there
is 32 of them, so that is 64KiB * 32 = 2MiB of memory.

And now I was able to inline whole `virtio_blk_rw_block()` into
`virtio_blk_bd_rw_blocks()`, `memcpy()`-ing on all the blocks, etc,
therefore getting rid of the loop... measured 90x - 116x speedup.

And it seems to work fine, I was able to create and write to some files
in ext4 and then read it back on Linux host. So the question is the
memory.

Code at [3]. What do you think?

[3]: 
https://github.com/mcimerman/helenos/blob/virtio-blk-multi-block/uspace/drv/block/virtio-blk/virtio-blk.c


--
Miroslav Cimerman


_______________________________________________
HelenOS-devel mailing list
HelenOS-devel@lists.modry.cz
http://lists.modry.cz/listinfo/helenos-devel

Reply via email to