Thanks everyone for the suggestions. Disabling the RBD cache, disabling the debug logging and building qemu with jemalloc each had a significant impact. Performance is up from 25K IOPS to 63K IOPS. Hopefully the ongoing work to reduce the number of buffer copies will yield further improvements.
I have a followup question about the debug logging. Is there any way to dump the in-memory logs from the QEMU RBD client? If not (and I couldn’t find a way to do this), then nothing is lost by disabling the logging on client machines. Thanks, Phil > On Feb 16, 2017, at 1:20 PM, Jason Dillaman <[email protected]> wrote: > > Few additional suggestions: > > 1) For high IOPS random read workloads, the librbd cache is most likely going > to be a bottleneck and is providing zero benefit. Recommend setting > "cache=none" on your librbd QEMU disk to disable it. > > 2) Disable logging via your ceph.conf. Example settings: > > debug_auth = 0/0 > debug_buffer = 0/0 > debug_context = 0/0 > debug_crypto = 0/0 > debug_finisher = 0/0 > debug_ms = 0/0 > debug_objectcacher = 0/0 > debug_objecter = 0/0 > debug_rados = 0/0 > debug_rbd = 0/0 > debug_striper = 0/0 > debug_tp = 0/0 > > The above two config changes on my small development cluster take my librbd > 4K random reads IOPS from ~9.5K to ~12.5K IOPS (+32%) > > 3) librbd / librados is very heavy with small memory allocations on the IO > path and previous reports have indicated that using jemalloc w/ QEMU shows > large improvements. > > LD_PRELOADing jemalloc within fio using the optimized config takes me from > ~12.5K IOPS to ~13.5K IOPS (+8%). > > > On Thu, Feb 16, 2017 at 3:38 PM, Steve Taylor <[email protected] > <mailto:[email protected]>> wrote: > > You might try running fio directly on the host using the rbd ioengine (direct > librbd) and see how that compares. The major difference between that and the > krbd test will be the page cache readahead, which will be present in the krbd > stack but not with the rbd ioengine. I would have expected the guest OS to > normalize that some due to its own page cache in the librbd test, but that > might at least give you some more clues about where to look further. > > > > <imagea0af4f.JPG> <https://storagecraft.com/> Steve Taylor | Senior Software > Engineer | StorageCraft Technology Corporation <https://storagecraft.com/> > 380 Data Drive Suite 300 | Draper | Utah | 84020 > Office: 801.871.2799 <tel:(801)%20871-2799> | > > > > If you are not the intended recipient of this message or received it > erroneously, please notify the sender and delete it, together with any > attachments, and be advised that any dissemination or copying of this message > is prohibited. > > > -----Original Message----- > From: ceph-users [mailto:[email protected] > <mailto:[email protected]>] On Behalf Of Phil Lacroute > Sent: Thursday, February 16, 2017 11:54 AM > To: [email protected] <mailto:[email protected]> > Subject: [ceph-users] KVM/QEMU rbd read latency > > Hi, > > I am doing some performance characterization experiments for ceph with KVM > guests, and I’m observing significantly higher read latency when using the > QEMU rbd client compared to krbd. Is that expected or have I missed some > tuning knobs to improve this? > > Cluster details: > Note that this cluster was built for evaluation purposes, not production, > hence the choice of small SSDs with low endurance specs. > Client host OS: Debian, 4.7.0 kernel > QEMU version 2.7.0 > Ceph version Jewel 10.2.3 > Client and OSD CPU: Xeon D-1541 2.1 GHz > OSDs: 5 nodes, 3 SSDs each, one journal partition and one data partition per > SSD, XFS data file system (15 OSDs total) > Disks: DC S3510 240GB > Network: 10 GbE, dedicated switch for storage traffic Guest OS: Debian, > virtio drivers > > Performance testing was done with fio on raw disk devices using this config: > ioengine=libaio > iodepth=128 > direct=1 > size=100% > rw=randread > bs=4k > > Case 1: krbd, fio running on the raw rbd device on the client host (no guest) > IOPS: 142k > Average latency: 0.9 msec > > Case 2: krbd, fio running in a guest (libvirt config below) > <disk type='file' device='disk'> > <driver name='qemu' type='raw' cache='none'/> > <source file='/dev/rbd0'/> > <backingStore/> > <target dev='vdb' bus='virtio'/> > </disk> > IOPS: 119k > Average Latency: 1.1 msec > > Case 3: QEMU RBD client, fio running in a guest (libvirt config below) > <disk type='network' device='disk'> > <driver name='qemu'/> > <auth username='app1'> > <secret type='ceph' usage='app_pool'/> > </auth> > <source protocol='rbd' name='app/image1'/> > <target dev='vdc' bus='virtio'/> > </disk> > IOPS: 25k > Average Latency: 5.2 msec > > The question is why the test with the QEMU RBD client (case 3) shows 4 msec > of additional latency compared the guest using the krbd-mapped image (case 2). > > Note that the IOPS bottleneck for all of these cases is the rate at which the > client issues requests, which is limited by the average latency and the > maximum number of outstanding requests (128). Since the latency is the > dominant factor in average read throughput for these small accesses, we would > really like to understand the source of the additional latency. > > Thanks, > Phil > > > > > > _______________________________________________ > ceph-users mailing list > [email protected] <mailto:[email protected]> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com> > > > > > -- > Jason
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
