Re: [ceph-users] All SSD cluster performance

Wido den Hollander Sat, 14 Jan 2017 01:10:26 -0800

> Op 14 januari 2017 om 6:41 schreef Christian Balzer <ch...@gol.com>:
> 
> 
> 
> Hello,
> 
> On Fri, 13 Jan 2017 13:18:35 -0500 Mohammed Naser wrote:
> 
> > These Intel SSDs are more than capable of handling the workload, in 
> > addition, this cluster is used as an RBD backend for an OpenStack cluster. 
> >
> 
> I haven't tested the S3520s yet, them being the first 3D NAND offering
> from Intel they are slightly slower than the predecessors in terms of BW
> and IOPS, but have supposedly a slightly lower latency if the specs are to
> believed.
> 
> Given the history of Intel DC S SSDs I think it is safe to assume that they
> use the same/similar controller setup as their predecessors, meaning a
> large powercap backed cache which enables them to deal correctly and
> quickly with SYNC and DIRECT writes. 
> 
> What would worry me slight more (even at their 960GB size) is the endurance
> of 1 DWPD, which with journals inline comes down to 0.5 and with FS
> overhead and write amplification (depends a lot on the type of operations)
> you're looking a something along 0.3 DWPD to base your expectations on.
> Mind, that still leaves you with about 9.6TB per day, which is a decent
> enough number, but only equates to about 112MB/s.
> 
> Finally, most people start with looking at bandwidth/throughput when
> penultimately they discover it was IOPS they needed first and foremost.


Yes! Bandwidth isn't what people usually need, they need IOps. Low latency.

I see a lot of clusters doing 10k ~ 20k IOps with somewhere around 1Gbit/s of 
traffic.

Wido

> 
> Christian
> 
> > Sent from my iPhone
> > 
> > > On Jan 13, 2017, at 1:08 PM, Somnath Roy <somnath....@sandisk.com> wrote:
> > > 
> > > Also, there are lot of discussion about SSDs not suitable for Ceph write 
> > > workload (with filestore) in community as those are not good for 
> > > odirect/odsync kind of writes. Hope your SSDs are tolerant of that.
> > > 
> > > -----Original Message-----
> > > From: Somnath Roy
> > > Sent: Friday, January 13, 2017 10:06 AM
> > > To: 'Mohammed Naser'; Wido den Hollander
> > > Cc: ceph-users@lists.ceph.com
> > > Subject: RE: [ceph-users] All SSD cluster performance
> > > 
> > > << Both OSDs are pinned to two cores on the system Is there any reason 
> > > you are pinning osds like that ? I would say for object workload there is 
> > > no need to pin osds.
> > > The configuration you mentioned , Ceph with 4M object PUT it should be 
> > > saturating your network first.
> > > 
> > > Have you run say 4M object GET to see what BW you are getting ?
> > > 
> > > Thanks & Regards
> > > Somnath
> > > 
> > > -----Original Message-----
> > > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
> > > Mohammed Naser
> > > Sent: Friday, January 13, 2017 9:51 AM
> > > To: Wido den Hollander
> > > Cc: ceph-users@lists.ceph.com
> > > Subject: Re: [ceph-users] All SSD cluster performance
> > > 
> > > 
> > >> On Jan 13, 2017, at 12:41 PM, Wido den Hollander <w...@42on.com> wrote:
> > >> 
> > >> 
> > >>> Op 13 januari 2017 om 18:39 schreef Mohammed Naser 
> > >>> <mna...@vexxhost.com>:
> > >>> 
> > >>> 
> > >>> 
> > >>>> On Jan 13, 2017, at 12:37 PM, Wido den Hollander <w...@42on.com> wrote:
> > >>>> 
> > >>>> 
> > >>>>> Op 13 januari 2017 om 18:18 schreef Mohammed Naser 
> > >>>>> <mna...@vexxhost.com>:
> > >>>>> 
> > >>>>> 
> > >>>>> Hi everyone,
> > >>>>> 
> > >>>>> We have a deployment with 90 OSDs at the moment which is all SSD 
> > >>>>> that’s not hitting quite the performance that it should be in my 
> > >>>>> opinion, a `rados bench` run gives something along these numbers:
> > >>>>> 
> > >>>>> Maintaining 16 concurrent writes of 4194304 bytes to objects of
> > >>>>> size 4194304 for up to 10 seconds or 0 objects Object prefix: 
> > >>>>> benchmark_data_bench.vexxhost._30340
> > >>>>> sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg 
> > >>>>> lat(s)
> > >>>>>  0       0         0         0         0         0           -        
> > >>>>>    0
> > >>>>>  1      16       158       142   568.513       568   0.0965336   
> > >>>>> 0.0939971
> > >>>>>  2      16       287       271   542.191       516   0.0291494    
> > >>>>> 0.107503
> > >>>>>  3      16       375       359    478.75       352   0.0892724    
> > >>>>> 0.118463
> > >>>>>  4      16       477       461   461.042       408   0.0243493    
> > >>>>> 0.126649
> > >>>>>  5      16       540       524   419.216       252    0.239123    
> > >>>>> 0.132195
> > >>>>>  6      16       644       628    418.67       416    0.347606    
> > >>>>> 0.146832
> > >>>>>  7      16       734       718   410.281       360   0.0534447    
> > >>>>> 0.147413
> > >>>>>  8      16       811       795   397.487       308   0.0311927     
> > >>>>> 0.15004
> > >>>>>  9      16       879       863   383.537       272   0.0894534    
> > >>>>> 0.158513
> > >>>>> 10      16       980       964   385.578       404   0.0969865    
> > >>>>> 0.162121
> > >>>>> 11       3       981       978   355.613        56    0.798949    
> > >>>>> 0.171779
> > >>>>> Total time run:         11.063482
> > >>>>> Total writes made:      981
> > >>>>> Write size:             4194304
> > >>>>> Object size:            4194304
> > >>>>> Bandwidth (MB/sec):     354.68
> > >>>>> Stddev Bandwidth:       137.608
> > >>>>> Max bandwidth (MB/sec): 568
> > >>>>> Min bandwidth (MB/sec): 56
> > >>>>> Average IOPS:           88
> > >>>>> Stddev IOPS:            34
> > >>>>> Max IOPS:               142
> > >>>>> Min IOPS:               14
> > >>>>> Average Latency(s):     0.175273
> > >>>>> Stddev Latency(s):      0.294736
> > >>>>> Max latency(s):         1.97781
> > >>>>> Min latency(s):         0.0205769
> > >>>>> Cleaning up (deleting benchmark objects) Clean up completed and
> > >>>>> total clean up time :3.895293
> > >>>>> 
> > >>>>> We’ve verified the network by running `iperf` across both replication 
> > >>>>> and public networks and it resulted in 9.8Gb/s (10G links for both).  
> > >>>>> The machine that’s running the benchmark doesn’t even saturate it’s 
> > >>>>> port.  The SSDs are S3520 960GB drives which we’ve benchmarked and 
> > >>>>> they can handle the load using fio/etc.  At this point, not really 
> > >>>>> sure where to look next.. anyone running all SSD clusters that might 
> > >>>>> be able to share their experience?
> > >>>> 
> > >>>> I suggest that you search a bit on the ceph-users list since this 
> > >>>> topic has been discussed multiple times in the past and even recently.
> > >>>> 
> > >>>> Ceph isn't your average storage system and you have to keep that in 
> > >>>> mind. Nothing is free in this world. Ceph provides excellent 
> > >>>> consistency and distribution of data, but that also means that you 
> > >>>> have things like network and CPU latency.
> > >>>> 
> > >>>> However, I suggest you look up a few threads on this list which have 
> > >>>> valuable tips.
> > >>>> 
> > >>>> Wido
> > >>> 
> > >>> Thanks for the reply, I’ve actually done quite a lot of research and 
> > >>> went through many of the previous posts.  While I agree a 100% with 
> > >>> your statement, I’ve found that other people with similar setups have 
> > >>> been able to reach numbers that I cannot, which leads me to believe 
> > >>> that there is actually an issue in here.  They have been able to max 
> > >>> out at 1200 MB/s which is the maximum of their benchmarking host.  We’d 
> > >>> like to reach that and I think that given the specifications of the 
> > >>> cluster, it can do it with no problems.
> > >> 
> > >> A few tips:
> > >> 
> > >> - Disable all logging in Ceph (debug_osd, debug_ms, debug_auth, etc,
> > >> etc)
> > > 
> > > All logging is configured to default settings, should those be turned 
> > > down?
> > > 
> > >> - Disable power saving on the CPUs
> > > 
> > > All disabled as well, everything running on `performance` mode.
> > > 
> > >> 
> > >> Can you also share how the 90 OSDs are distributed in the cluster and 
> > >> what CPUs you have?
> > > 
> > > There are 45 machines with 2 OSDs each.   The servers they’re located on 
> > > on average have 24 core ~3GHz Intel CPUs.  Both OSDs are pinned to two 
> > > cores on the system.
> > > 
> > >> 
> > >> Wido
> > >> 
> > >>> 
> > >>>>> 
> > >>>>> Thanks,
> > >>>>> Mohammed
> > >>>>> _______________________________________________
> > >>>>> ceph-users mailing list
> > >>>>> ceph-users@lists.ceph.com
> > >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >>> 
> > > 
> > > _______________________________________________
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > 
> > > ________________________________
> > > 
> > > PLEASE NOTE: The information contained in this electronic mail message is 
> > > intended only for the use of the designated recipient(s) named above. If 
> > > the reader of this message is not the intended recipient, you are hereby 
> > > notified that you have received this message in error and that any 
> > > review, dissemination, distribution, or copying of this message is 
> > > strictly prohibited. If you have received this communication in error, 
> > > please notify the sender by telephone or e-mail (as shown above) 
> > > immediately and destroy any and all copies of this message in your 
> > > possession (whether hard copies or electronically stored copies).
> > > 
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> -- 
> Christian Balzer        Network/Systems Engineer                
> ch...@gol.com         Global OnLine Japan/Rakuten Communications
> http://www.gol.com/
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] All SSD cluster performance

Reply via email to