On Fri, Nov 28, 2014 at 5:46 PM, Dan Van Der Ster
<[email protected]> wrote:
> Hi Andrei,
> Yes, I’m testing from within the guest.
>
> Here is an example. First, I do 2MB reads when the max_sectors_kb=512, and
> we see the reads are split into 4. (fio sees 25 iops, though iostat reports
> 100 smaller iops):
>
> # echo 512 > /sys/block/vdb/queue/max_sectors_kb # this is the default
> # fio --readonly --name /dev/vdb --rw=read --size=1G --ioengine=libaio
> --direct=1 --runtime=10s --blocksize=2m
> /dev/vdb: (g=0): rw=read, bs=2M-2M/2M-2M/2M-2M, ioengine=libaio, iodepth=1
> fio-2.0.13
> Starting 1 process
> Jobs: 1 (f=1): [R] [100.0% done] [51200K/0K/0K /s] [25 /0 /0 iops] [eta
> 00m:00s]
>
> meanwhile iostat is reporting 100 iops of average size 1024 sectors (i.e.
> 512kB):
>
> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz
> avgqu-sz await svctm %util
> vdb 0.00 0.00 100.00 0.00 50.00 0.00 1024.00
> 3.02 30.25 10.00 100.00
>
>
>
> Now increase the max_sectors_kb to 4MB, and the IOs are no longer split:
>
> # echo 4096 > /sys/block/vdb/queue/max_sectors_kb
> # fio --readonly --name /dev/vdb --rw=read --size=1G --ioengine=libaio
> --direct=1 --runtime=10s --blocksize=2m
> /dev/vdb: (g=0): rw=read, bs=2M-2M/2M-2M/2M-2M, ioengine=libaio, iodepth=1
> fio-2.0.13
> Starting 1 process
> Jobs: 1 (f=1): [R] [100.0% done] [200.0M/0K/0K /s] [100 /0 /0 iops] [eta
> 00m:00s]
>
> iostat reports 100 iops, 4096 sectors each read (i.e. 2MB):
>
> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz
> avgqu-sz await svctm %util
> vdb 300.00 0.00 100.00 0.00 200.00 0.00 4096.00
> 0.99 9.94 9.94 99.40
We set the hard request size limit to rbd object size (4M typically)
blk_queue_max_hw_sectors(q, segment_size / SECTOR_SIZE);
but block layer then sets the soft limit for fs requests to 512K
BLK_DEF_MAX_SECTORS = 1024,
limits->max_sectors = min_t(unsigned int, max_hw_sectors,
BLK_DEF_MAX_SECTORS);
which you are supposed to change on a per-device basis via sysfs. We
could probably raise the soft limit to rbd object size by default as
well - I don't see any harm in that.
Thanks,
Ilya
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com