Hi, I am doing some performance characterization experiments for ceph with KVM guests, and I’m observing significantly higher read latency when using the QEMU rbd client compared to krbd. Is that expected or have I missed some tuning knobs to improve this?
Cluster details:
Note that this cluster was built for evaluation purposes, not production, hence
the choice of small SSDs with low endurance specs.
Client host OS: Debian, 4.7.0 kernel
QEMU version 2.7.0
Ceph version Jewel 10.2.3
Client and OSD CPU: Xeon D-1541 2.1 GHz
OSDs: 5 nodes, 3 SSDs each, one journal partition and one data partition per
SSD, XFS data file system (15 OSDs total)
Disks: DC S3510 240GB
Network: 10 GbE, dedicated switch for storage traffic
Guest OS: Debian, virtio drivers
Performance testing was done with fio on raw disk devices using this config:
ioengine=libaio
iodepth=128
direct=1
size=100%
rw=randread
bs=4k
Case 1: krbd, fio running on the raw rbd device on the client host (no guest)
IOPS: 142k
Average latency: 0.9 msec
Case 2: krbd, fio running in a guest (libvirt config below)
<disk type='file' device='disk'>
<driver name='qemu' type='raw' cache='none'/>
<source file='/dev/rbd0'/>
<backingStore/>
<target dev='vdb' bus='virtio'/>
</disk>
IOPS: 119k
Average Latency: 1.1 msec
Case 3: QEMU RBD client, fio running in a guest (libvirt config below)
<disk type='network' device='disk'>
<driver name='qemu'/>
<auth username='app1'>
<secret type='ceph' usage='app_pool'/>
</auth>
<source protocol='rbd' name='app/image1'/>
<target dev='vdc' bus='virtio'/>
</disk>
IOPS: 25k
Average Latency: 5.2 msec
The question is why the test with the QEMU RBD client (case 3) shows 4 msec of
additional latency compared the guest using the krbd-mapped image (case 2).
Note that the IOPS bottleneck for all of these cases is the rate at which the
client issues requests, which is limited by the average latency and the maximum
number of outstanding requests (128). Since the latency is the dominant factor
in average read throughput for these small accesses, we would really like to
understand the source of the additional latency.
Thanks,
Phil
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
