I'm having an issue with small sequential reads (such as searching
through source code files, etc), and I found that multiple small reads
withing a 4MB boundary would fetch the same object from the OSD multiple
times, as it gets inserted into the RBD cache partially.

How to reproduce: rbd image accessed from a Qemu vm using virtio-scsi,
writethrough cache on. Monitor with perf dump on the rbd client. The
image is filled up with zeroes in advance. Rbd readahead is off.

1 - Small read from a previously unread section of the disk:
dd if=/dev/sdb ibs=512 count=1 skip=41943040 iflag=skip_bytes
Notes: dd cannot read less than 512 bytes. The skip is arbitrary to
avoid the beginning of the disk, which would have been read at boot.

Expected outcomes: perf dump should show a +1 increase on values rd,
cache_ops_miss and op_r. This happens correctly.
It should show a 4194304 increase in data_read as a whole object is put
into the cache. Instead it increases by 4096. (not sure why 4096, btw).

2 - Small read from less than 4MB distance (in the example, +5000b).
dd if=/dev/sdb ibs=512 count=1 skip=41948040 iflag=skip_bytes
Expected outcomes: perf dump should show a +1 increase on cache_ops_hit.
Instead cache_ops_miss increases.
It should show a 4194304 increase in data_read as a whole object is put
into the cache. Instead it increases by 4096.
op_r should not increase. Instead it increases by one, indicating that
the object was fetched again.

My tests show that this could be causing a 6 to 20-fold performance loss
in small sequential reads.

Is it by design that the RBD cache only inserts the portion requested by
the client instead of the whole last object fetched? Could it be a
tunable in any of my layers (fs, block device, qemu, rbd...) that is
preventing this?

Regards,
-- 
Ruben Rodriguez | Senior Systems Administrator, Free Software Foundation
GPG Key: 05EF 1D2F FE61 747D 1FC8  27C3 7FAC 7D26 472F 4409
https://fsf.org | https://gnu.org

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to