Hello, I've been looking at increasing block I/O performance and I have some proposals, but I would like your opinion and feedback before opening almost ready pull requests. I'll appreciate any feedback, advice and new ideas.
Proposals: 1) virtio-blk 2) libblock heavy buffering for sequential fs I/O I will split them into separate mails. 1) virtio-blk: The main performance culprit is in processing multiple blocks in one-by-one fashion [1]. [1]: https://github.com/HelenOS/helenos/blob/master/uspace/drv/block/virtio-blk/virtio-blk.c#L271 I tested it with an old HDD and SSD I have in my server, writing 20480 blocks (10MiB) at MAX IPC XFER size [2], so in batches of 128 blocks: (MAX IPC XFER (64KiB) / DEV BSIZE (512)). Setup: KVM-enabled Qemu with real block devices connected via -drive ...,if=virtio on Linux. [2]: actual constant is called DATA_XFER_LIMIT in <abi/ipc/ipc.h> 1st try: I threw my Fibril Group Executor (tm) :-) at it, but I was only able to get 6x - 9x speedup out of it, plus I don't think 128 fibrils are healthy for the scheduler, even though they are sleepy waiting for IRQ. 2nd try: I upscaled the DMA request buffer allocation size to 64KiB per request buffer (formerly only 512B per rq_buf) and there is 32 of them, so that is 64KiB * 32 = 2MiB of memory. And now I was able to inline whole `virtio_blk_rw_block()` into `virtio_blk_bd_rw_blocks()`, `memcpy()`-ing on all the blocks, etc, therefore getting rid of the loop... measured 90x - 116x speedup. And it seems to work fine, I was able to create and write to some files in ext4 and then read it back on Linux host. So the question is the memory. Code at [3]. What do you think? [3]: https://github.com/mcimerman/helenos/blob/virtio-blk-multi-block/uspace/drv/block/virtio-blk/virtio-blk.c -- Miroslav Cimerman _______________________________________________ HelenOS-devel mailing list HelenOS-devel@lists.modry.cz http://lists.modry.cz/listinfo/helenos-devel