Thanks everyone for the suggestions.  Disabling the RBD cache, disabling the 
debug logging and building qemu with jemalloc each had a significant impact.  
Performance is up from 25K IOPS to 63K IOPS.  Hopefully the ongoing work to 
reduce the number of buffer copies will yield further improvements.

I have a followup question about the debug logging.  Is there any way to dump 
the in-memory logs from the QEMU RBD client?  If not (and I couldn’t find a way 
to do this), then nothing is lost by disabling the logging on client machines.

Thanks,
Phil

> On Feb 16, 2017, at 1:20 PM, Jason Dillaman <[email protected]> wrote:
> 
> Few additional suggestions:
> 
> 1) For high IOPS random read workloads, the librbd cache is most likely going 
> to be a bottleneck and is providing zero benefit. Recommend setting 
> "cache=none" on your librbd QEMU disk to disable it.
> 
> 2) Disable logging via your ceph.conf. Example settings:
> 
> debug_auth = 0/0
> debug_buffer = 0/0
> debug_context = 0/0
> debug_crypto = 0/0
> debug_finisher = 0/0
> debug_ms = 0/0
> debug_objectcacher = 0/0
> debug_objecter = 0/0
> debug_rados = 0/0
> debug_rbd = 0/0
> debug_striper = 0/0
> debug_tp = 0/0
> 
> The above two config changes on my small development cluster take my librbd 
> 4K random reads IOPS from ~9.5K to ~12.5K IOPS (+32%)
> 
> 3) librbd / librados is very heavy with small memory allocations on the IO 
> path and previous reports have indicated that using jemalloc w/ QEMU shows 
> large improvements.
> 
> LD_PRELOADing jemalloc within fio using the optimized config takes me from 
> ~12.5K IOPS to ~13.5K IOPS (+8%).
> 
> 
> On Thu, Feb 16, 2017 at 3:38 PM, Steve Taylor <[email protected] 
> <mailto:[email protected]>> wrote:
> 
> You might try running fio directly on the host using the rbd ioengine (direct 
> librbd) and see how that compares. The major difference between that and the 
> krbd test will be the page cache readahead, which will be present in the krbd 
> stack but not with the rbd ioengine. I would have expected the guest OS to 
> normalize that some due to its own page cache in the librbd test, but that 
> might at least give you some more clues about where to look further.
> 
> 
> 
> <imagea0af4f.JPG> <https://storagecraft.com/> Steve Taylor | Senior Software 
> Engineer | StorageCraft Technology Corporation <https://storagecraft.com/>
> 380 Data Drive Suite 300 | Draper | Utah | 84020
> Office: 801.871.2799 <tel:(801)%20871-2799> |
> 
> 
> 
> If you are not the intended recipient of this message or received it 
> erroneously, please notify the sender and delete it, together with any 
> attachments, and be advised that any dissemination or copying of this message 
> is prohibited.
> 
> 
> -----Original Message-----
> From: ceph-users [mailto:[email protected] 
> <mailto:[email protected]>] On Behalf Of Phil Lacroute
> Sent: Thursday, February 16, 2017 11:54 AM
> To: [email protected] <mailto:[email protected]>
> Subject: [ceph-users] KVM/QEMU rbd read latency
> 
> Hi,
> 
> I am doing some performance characterization experiments for ceph with KVM 
> guests, and I’m observing significantly higher read latency when using the 
> QEMU rbd client compared to krbd.  Is that expected or have I missed some 
> tuning knobs to improve this?
> 
> Cluster details:
> Note that this cluster was built for evaluation purposes, not production, 
> hence the choice of small SSDs with low endurance specs.
> Client host OS: Debian, 4.7.0 kernel
> QEMU version 2.7.0
> Ceph version Jewel 10.2.3
> Client and OSD CPU: Xeon D-1541 2.1 GHz
> OSDs: 5 nodes, 3 SSDs each, one journal partition and one data partition per 
> SSD, XFS data file system (15 OSDs total)
> Disks: DC S3510 240GB
> Network: 10 GbE, dedicated switch for storage traffic Guest OS: Debian, 
> virtio drivers
> 
> Performance testing was done with fio on raw disk devices using this config:
> ioengine=libaio
> iodepth=128
> direct=1
> size=100%
> rw=randread
> bs=4k
> 
> Case 1: krbd, fio running on the raw rbd device on the client host (no guest)
> IOPS: 142k
> Average latency: 0.9 msec
> 
> Case 2: krbd, fio running in a guest (libvirt config below)
>    <disk type='file' device='disk'>
>      <driver name='qemu' type='raw' cache='none'/>
>      <source file='/dev/rbd0'/>
>      <backingStore/>
>      <target dev='vdb' bus='virtio'/>
>    </disk>
> IOPS: 119k
> Average Latency: 1.1 msec
> 
> Case 3: QEMU RBD client, fio running in a guest (libvirt config below)
>    <disk type='network' device='disk'>
>      <driver name='qemu'/>
>      <auth username='app1'>
>        <secret type='ceph' usage='app_pool'/>
>      </auth>
>      <source protocol='rbd' name='app/image1'/>
>      <target dev='vdc' bus='virtio'/>
>    </disk>
> IOPS: 25k
> Average Latency: 5.2 msec
> 
> The question is why the test with the QEMU RBD client (case 3) shows 4 msec 
> of additional latency compared the guest using the krbd-mapped image (case 2).
> 
> Note that the IOPS bottleneck for all of these cases is the rate at which the 
> client issues requests, which is limited by the average latency and the 
> maximum number of outstanding requests (128).  Since the latency is the 
> dominant factor in average read throughput for these small accesses, we would 
> really like to understand the source of the additional latency.
> 
> Thanks,
> Phil
> 
> 
> 
> 
> 
> _______________________________________________
> ceph-users mailing list
> [email protected] <mailto:[email protected]>
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
> 
> 
> 
> 
> -- 
> Jason

Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to