With my S3500 drives in my test cluster, the latest master branch gave me
an almost 2x increase in performance compare to just a month or two ago.
There looks to be some really nice things coming in Jewel around SSD
performance. My drives are now 80-85% busy doing about 10-12K IOPS when
doing 4K fio to libRBD.

Sent from a mobile device, please excuse any typos.
On Feb 24, 2016 8:10 PM, "Christian Balzer" <[email protected]> wrote:

>
> Hello,
>
> For posterity and of course to ask some questions, here are my experiences
> with a pure SSD pool.
>
> SW: Debian Jessie, Ceph Hammer 0.94.5.
>
> HW:
> 2 nodes (thus replication of 2) with each:
> 2x E5-2623 CPUs
> 64GB RAM
> 4x DC S3610 800GB SSDs
> Infiniband (IPoIB) network
>
> Ceph: no tuning or significant/relevant config changes, OSD FS is Ext4,
> Ceph journal is inline (journal file).
>
> Performance:
> A test run with "rados -p cache  bench 30 write -t 32" (4MB blocks) gives
> me about 620MB/s, the storage nodes are I/O bound (all SSDs are 100% busy
> according to atop) and this meshes nicely with the speeds I saw when
> testing the individual SSDs with fio before involving Ceph.
>
> To elaborate on that, an individual SSD of that type can do about 500MB/s
> sequential writes, so ideally you would see 1GB/s writes with Ceph
> (500*8/2(replication)/2(journal on same disk).
> However my experience tells me that other activities (FS journals, leveldb
> PG updates, etc) impact things as well.
>
> A test run with "rados -p cache  bench 30 write -t 32 -b 4096" (4KB
> blocks) gives me about 7200 IOPS, the SSDs are about 40% busy.
> All OSD processes are using about 2 cores and the OS another 2, but that
> leaves about 6 cores unused (MHz on all cores scales to max during the
> test run).
> Closer inspection with all CPUs being displayed in atop shows that no
> single core is fully used, they all average around 40% and even the
> busiest ones (handling IRQs) still have ample capacity available.
> I'm wondering if this an indication of insufficient parallelism or if it's
> latency of sorts.
> I'm aware of the many tuning settings for SSD based OSDs, however I was
> expecting to run into a CPU wall first and foremost.
>
>
> Write amplification:
> 10 second rados bench with 4MB blocks, 6348MB written in total.
> nand-writes per SSD:118*32MB=3776MB.
> 30208MB total written to all SSDs.
> Amplification:4.75
>
> Very close to what you would expect with a replication of 2 and journal on
> same disk.
>
>
> 10 second rados bench with 4KB blocks, 219MB written in total.
> nand-writes per SSD:41*32MB=1312MB.
> 10496MB total written to all SSDs.
> Amplification:48!!!
>
> Le ouch.
> In my use case with rbd cache on all VMs I expect writes to be rather
> large for the most part and not like this extreme example.
> But as I wrote the last time I did this kind of testing, this is an area
> where caveat emptor most definitely applies when planning and buying SSDs.
> And where the Ceph code could probably do with some attention.
>
> Regards,
>
> Christian
> --
> Christian Balzer        Network/Systems Engineer
> [email protected]           Global OnLine Japan/Rakuten Communications
> http://www.gol.com/
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to