I currently have about 250 VMs, ranging from 16GB to 2TB in size. What I found, 
after about a week of testing, sniffing, and observing, is that the larger read 
ahead buffer causes the VM to chunk reads over to ceph, and in doing so, allows 
it to better align with the 4MB block size that Ceph uses. If I dropped the 
cache below 16MB, performance would degrade, almost linearly, all the way down 
to the 16kb standard size. And when I increased it above 16MB, there were some 
intermittent gains, but overall nothing to write home about.

For reference, our ceph cluster is 88TB, spread across 88 1TB SSDs. Each 
storage node has 100GbE connectivity, and each cloud host (proxmox) has 40GbE. 
I'm able to sustain 3400 iops regularly, and seen spikes as high as 5200+ iops 
in our calamari logs. In addition, due to some clever use of LACP and mlag, I'm 
able to sustain 3000+ iops per cloud host simultaneously. Our workload at this 
time for the VMs are MSSQL servers, MySQL servers, and BI servers (Pentaho). We 
also have our ELK stack and collectd/Graphite/Grafana stack in this specific 
cloud. 

In the end, the root cause of the issue, based on my testing and 
investigations, centers around the mismatch of the block sizes between the VMs 
(4kb buffered to 16kb default) and Ceph (4MB blocks).

-- 
Stephen Mercier
Senior Systems Architect
Attainia, Inc.
Phone: 866-288-2464 ext. 727
Email: [email protected]
Web: www.attainia.com

Capital equipment lifecycle planning & budgeting solutions for healthcare






On Jun 30, 2015, at 10:49 AM, Tuomas Juntunen wrote:

> Hi
>  
> This is something I was thinking too. But it doesn’t take away the problem.
>  
> Can you share your setup and how many VM’s you are running, that would give 
> us some starting point on sizing our setup.
>  
> Thanks
>  
> Br,
> Tuomas
>  
> From: Stephen Mercier [mailto:[email protected]] 
> Sent: 30. kesäkuuta 2015 20:32
> To: Tuomas Juntunen
> Cc: 'Somnath Roy'; 'ceph-users'
> Subject: Re: [ceph-users] Very low 4k randread performance ~1000iops
>  
> I ran into the same problem. What we did, and have been using since, is 
> increased the read ahead buffer in the VMs to 16MB (The sweet spot we settled 
> on after testing). This isn't a solution for all scenarios, but for our uses, 
> it was enough to get performance inline with expectations.
>  
> In Ubuntu, we added the following udev config to facilitate this:
>  
> root@ubuntu:/lib/udev/rules.d# vi /etc/udev/rules.d/99-virtio.rules 
>  
> SUBSYSTEM=="block", ATTR{queue/rotational}=="1", ACTION=="add|change", 
> KERNEL=="vd[a-z]", ATTR{bdi/read_ahead_kb}="16384", 
> ATTR{queue/read_ahead_kb}="16384", ATTR{queue/scheduler}="deadline"
>  
>  
> Cheers,
> -- 
> Stephen Mercier
> Senior Systems Architect
> Attainia, Inc.
> Phone: 866-288-2464 ext. 727
> Email: [email protected]
> Web: www.attainia.com
>  
> Capital equipment lifecycle planning & budgeting solutions for healthcare
>  
>  
>  
> On Jun 30, 2015, at 10:18 AM, Tuomas Juntunen wrote:
> 
> 
> Hi
>  
> It’s not probably hitting the disks, but that really doesn’t matter. The 
> point is we have very responsive VM’s while writing and that is what the 
> users will see.
> The iops we get with sequential read is good, but the random read is way too 
> low.
>  
> Is using SSD’s as OSD’s the only way to get it up? or is there some tunable 
> which would enhance it? I would assume Linux caches reads in memory and 
> serves them from there, but atleast now we don’t see it.
>  
> Br,
> Tuomas
>  
>  
> From: Somnath Roy [mailto:[email protected]] 
> Sent: 30. kesäkuuta 2015 19:24
> To: Tuomas Juntunen; 'ceph-users'
> Subject: RE: [ceph-users] Very low 4k randread performance ~1000iops
>  
> Break it down, try fio-rbd to see what is the performance you getting..
> But, I am really surprised you are getting > 100k iops for write, did you 
> check it is hitting the disks ?
>  
> Thanks & Regards
> Somnath
>  
> From: ceph-users [mailto:[email protected]] On Behalf Of 
> Tuomas Juntunen
> Sent: Tuesday, June 30, 2015 8:33 AM
> To: 'ceph-users'
> Subject: [ceph-users] Very low 4k randread performance ~1000iops
>  
> Hi
>  
> I have been trying to figure out why our 4k random reads in VM’s are so bad. 
> I am using fio to test this.
>  
> Write : 170k iops
> Random write : 109k iops
> Read : 64k iops
> Random read : 1k iops
>  
> Our setup is:
> 3 nodes with 36 OSDs, 18 SSD’s one SSD for two OSD’s, each node has 64gb mem 
> & 2x6core cpu’s
> 4 monitors running on other servers
> 40gbit infiniband with IPoIB
> Openstack : Qemu-kvm for virtuals
>  
> Any help would be appreciated
>  
> Thank you in advance.
>  
> Br,
> Tuomas
>  
> 
> PLEASE NOTE: The information contained in this electronic mail message is 
> intended only for the use of the designated recipient(s) named above. If the 
> reader of this message is not the intended recipient, you are hereby notified 
> that you have received this message in error and that any review, 
> dissemination, distribution, or copying of this message is strictly 
> prohibited. If you have received this communication in error, please notify 
> the sender by telephone or e-mail (as shown above) immediately and destroy 
> any and all copies of this message in your possession (whether hard copies or 
> electronically stored copies).
> 
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to