On Fri, Jun 26, 2015 at 3:17 PM, Z Zhang <[email protected]> wrote:
> Hi Ilya,
>
> I am seeing your recent email talking about krbd splitting large IO's into
> smaller IO's, see below link.
>
> https://www.mail-archive.com/[email protected]/msg20587.html
>
> I just tried it on my ceph cluster using kernel 3.10.0-1. I adjust both
> max_sectors_kb and max_hw_sectors_kb of rbd device to 4096.
>
> Use fio with 4M block size for read:
>
> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz
> avgqu-sz await r_await w_await svctm %util
> rbd3 81.00 0.00 135.00 0.00 108.00 0.00 1638.40
> 2.72 20.15 20.15 0.00 7.41 100.00
>
>
> Use fio with 1M or 2M block size for read:
>
> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz
> avgqu-sz await r_await w_await svctm %util
> rbd3 0.00 0.00 213.00 0.00 106.50 0.00 1024.00
> 2.56 12.02 12.02 0.00 4.69 100.00
>
>
> Use fio with 4M block size for write:
>
> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz
> avgqu-sz await r_await w_await svctm %util
> rbd3 0.00 40.00 0.00 40.00 0.00 40.00 2048.00
> 2.87 70.90 0.00 70.90 24.90 99.60
>
>
> Use fio with 1M or 2M block size for write:
>
> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz
> avgqu-sz await r_await w_await svctm %util
> rbd3 0.00 0.00 0.00 80.00 0.00 40.00 1024.00
> 3.55 48.20 0.00 48.20 12.50 100.00
>
>
> So why the IO size here is far less than 4096 (If using default value 512,
> all the IO size is 1024)? Is there some other parameters need to adjust, or
> is it about this kernel version?
It's about this kernel version. Assuming you are doing direct I/Os
with fio, setting max_sectors_kb to 4096 is really the only thing you
can do, and that's enough to *sometimes* see 8192 sector (i.e. 4M) I/Os.
The problem is the max_segments value, which in 3.10 is 128 and which
you cannot adjust via sysfs.
It all comes down to a memory allocator. To get a 4M I/O, the total
number of segments (physically contiguous chunks of memory) in the
8 bios (8*512k = 4M) that need to be merged has to be <= 128. When you
are allocated such nice and contiguous bios, you get 4M I/Os. In other
cases you don't.
This will be fixed in 4.2, along with a bunch of other things. This
particular max_segment fix is a one liner, so we will probably backport
it to older kernels, including 3.10.
Thanks,
Ilya
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com