Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 2K IOPS

Dan Van Der Ster Tue, 02 Sep 2014 01:56:14 -0700

Hi Sebastien,
That sounds promising. Did you enable the sharded ops to get this result?
Cheers, Dan




> On 02 Sep 2014, at 02:19, Sebastien Han <[email protected]> wrote:
> 
> Mark and all, Ceph IOPS performance has definitely improved with Giant.
> With this version: ceph version 0.84-940-g3215c52 
> (3215c520e1306f50d0094b5646636c02456c9df4) on Debian 7.6 with Kernel 3.14-0.
> 
> I got 6340 IOPS on a single OSD SSD. (journal and data on the same partition).
> So basically twice the amount of IOPS that I was getting with Firefly.
> 
> Rand reads 4k went from 12431 to 10201, so I’m a bit disappointed here.
> 
> The SSD is still under-utilised:
> 
> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz 
> avgqu-sz   await r_await w_await  svctm  %util
> sdp1              0.00   540.37    0.00 5902.30     0.00    47.14    16.36    
>  0.87    0.15    0.00    0.15   0.07  40.15
> sdp2              0.00     0.00    0.00 4454.67     0.00    49.16    22.60    
>  0.31    0.07    0.00    0.07   0.07  30.61
> 
> Thanks a ton for all your comments and assistance guys :).
> 
> One last question for Sage (or other that might know), what’s the status of 
> the S2FS implementation? (or maybe we are waiting for S2FS to provide atomic 
> transactions?)
> I tried to run the OSD on f2fs however ceph-osd mkfs got stuck on a xattr 
> test:
> 
> fremovexattr(10, "user.test@5848273")   = 0
> 
> On 01 Sep 2014, at 11:13, Sebastien Han <[email protected]> wrote:
> 
>> Mark, thanks a lot for experimenting this for me.
>> I’m gonna try master soon and will tell you how much I can get. 
>> 
>> It’s interesting to see that using 2 SSDs brings up more performance, even 
>> both SSDs are under-utilized…
>> They should be able to sustain both loads at the same time (journal and osd 
>> data).
>> 
>> On 01 Sep 2014, at 09:51, Somnath Roy <[email protected]> wrote:
>> 
>>> As I said, 107K with IOs serving from memory, not hitting the disk..
>>> 
>>> From: Jian Zhang [mailto:[email protected]] 
>>> Sent: Sunday, August 31, 2014 8:54 PM
>>> To: Somnath Roy
>>> Cc: Haomai Wang; [email protected]
>>> Subject: Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 
>>> 2K IOPS
>>> 
>>> Somnath,
>>> on the small workload performance, 107k is higher than the theoretical IOPS 
>>> of 520, any idea why? 
>>> 
>>> 
>>> 
>>>>> Single client is ~14K iops, but scaling as number of clients increases. 
>>>>> 10 clients ~107K iops. ~25 cpu cores are used.
>>> 
>>> 
>>> 2014-09-01 11:52 GMT+08:00 Jian Zhang <[email protected]>:
>>> Somnath,
>>> on the small workload performance, 
>>> 
>>> 
>>> 
>>> 2014-08-29 14:37 GMT+08:00 Somnath Roy <[email protected]>:
>>> 
>>> Thanks Haomai !
>>> 
>>> Here is some of the data from my setup.
>>> 
>>> 
>>> 
>>> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>> 
>>> Set up:
>>> 
>>> --------
>>> 
>>> 
>>> 
>>> 32 core cpu with HT enabled, 128 GB RAM, one SSD (both journal and data) -> 
>>> one OSD. 5 client m/c with 12 core cpu and each running two instances of 
>>> ceph_smalliobench (10 clients total). Network is 10GbE.
>>> 
>>> 
>>> 
>>> Workload:
>>> 
>>> -------------
>>> 
>>> 
>>> 
>>> Small workload – 20K objects with 4K size and io_size is also 4K RR. The 
>>> intent is to serve the ios from memory so that it can uncover the 
>>> performance problems within single OSD.
>>> 
>>> 
>>> 
>>> Results from Firefly:
>>> 
>>> --------------------------
>>> 
>>> 
>>> 
>>> Single client throughput is ~14K iops, but as the number of client 
>>> increases the aggregated throughput is not increasing. 10 clients ~15K 
>>> iops. ~9-10 cpu cores are used.
>>> 
>>> 
>>> 
>>> Result with latest master:
>>> 
>>> ------------------------------
>>> 
>>> 
>>> 
>>> Single client is ~14K iops, but scaling as number of clients increases. 10 
>>> clients ~107K iops. ~25 cpu cores are used.
>>> 
>>> 
>>> 
>>> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>> 
>>> 
>>> 
>>> 
>>> 
>>> More realistic workload:
>>> 
>>> -----------------------------
>>> 
>>> Let’s see how it is performing while > 90% of the ios are served from disks
>>> 
>>> Setup:
>>> 
>>> -------
>>> 
>>> 40 cpu core server as a cluster node (single node cluster) with 64 GB RAM. 
>>> 8 SSDs -> 8 OSDs. One similar node for monitor and rgw. Another node for 
>>> client running fio/vdbench. 4 rbds are configured with ‘noshare’ option. 40 
>>> GbE network
>>> 
>>> 
>>> 
>>> Workload:
>>> 
>>> ------------
>>> 
>>> 
>>> 
>>> 8 SSDs are populated , so, 8 * 800GB = ~6.4 TB of data.  Io_size = 4K RR.
>>> 
>>> 
>>> 
>>> Results from Firefly:
>>> 
>>> ------------------------
>>> 
>>> 
>>> 
>>> Aggregated output while 4 rbd clients stressing the cluster in parallel is 
>>> ~20-25K IOPS , cpu cores used ~8-10 cores (may be less can’t remember 
>>> precisely)
>>> 
>>> 
>>> 
>>> Results from latest master:
>>> 
>>> --------------------------------
>>> 
>>> 
>>> 
>>> Aggregated output while 4 rbd clients stressing the cluster in parallel is 
>>> ~120K IOPS , cpu is 7% idle i.e  ~37-38 cpu cores.
>>> 
>>> 
>>> 
>>> Hope this helps.
>>> 
>>> 
>>> 
>>> Thanks & Regards
>>> 
>>> Somnath
>>> 
>>> 
>>> 
>>> -----Original Message-----
>>> From: Haomai Wang [mailto:[email protected]] 
>>> Sent: Thursday, August 28, 2014 8:01 PM
>>> To: Somnath Roy
>>> Cc: Andrey Korolyov; [email protected]
>>> Subject: Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 
>>> 2K IOPS
>>> 
>>> 
>>> Hi Roy,
>>> 
>>> 
>>> 
>>> I already scan your merged codes about "fdcache" and "optimizing for 
>>> lfn_find/lfn_open", could you give some performance improvement data about 
>>> it? I fully agree with your orientation, do you have any update about it?
>>> 
>>> 
>>> 
>>> As for messenger level, I have some very early works on 
>>> it(https://github.com/yuyuyu101/ceph/tree/msg-event), it contains a new 
>>> messenger implementation which support different event mechanism.
>>> 
>>> It looks like at least one more week to make it work.
>>> 
>>> 
>>> 
>>> On Fri, Aug 29, 2014 at 5:48 AM, Somnath Roy <[email protected]> 
>>> wrote:
>>> 
>>>> Yes, what I saw the messenger level bottleneck is still huge !
>>> 
>>>> Hopefully RDMA messenger will resolve that and the performance gain will 
>>>> be significant for Read (on SSDs). For write we need to uncover the OSD 
>>>> bottlenecks first to take advantage of the improved upstream.
>>> 
>>>> What I experienced that till you remove the very last bottleneck the 
>>>> performance improvement will not be visible and that could be confusing 
>>>> because you might think that the upstream improvement you did is not valid 
>>>> (which is not).
>>> 
>>>> 
>>> 
>>>> Thanks & Regards
>>> 
>>>> Somnath
>>> 
>>>> -----Original Message-----
>>> 
>>>> From: Andrey Korolyov [mailto:[email protected]]
>>> 
>>>> Sent: Thursday, August 28, 2014 12:57 PM
>>> 
>>>> To: Somnath Roy
>>> 
>>>> Cc: David Moreau Simard; Mark Nelson; [email protected]
>>> 
>>>> Subject: Re: [ceph-users] [Single OSD performance on SSD] Can't go
>>> 
>>>> over 3, 2K IOPS
>>> 
>>>> 
>>> 
>>>> On Thu, Aug 28, 2014 at 10:48 PM, Somnath Roy <[email protected]> 
>>>> wrote:
>>> 
>>>>> Nope, this will not be back ported to Firefly I guess.
>>> 
>>>>> 
>>> 
>>>>> Thanks & Regards
>>> 
>>>>> Somnath
>>> 
>>>>> 
>>> 
>>>> 
>>> 
>>>> Thanks for sharing this, the first thing in thought when I looked at
>>> 
>>>> this thread, was your patches :)
>>> 
>>>> 
>>> 
>>>> If Giant will incorporate them, both the RDMA support and those should 
>>>> give a huge performance boost for RDMA-enabled Ceph backnets.
>>> 
>>>> 
>>> 
>>>> ________________________________
>>> 
>>>> 
>>> 
>>>> PLEASE NOTE: The information contained in this electronic mail message is 
>>>> intended only for the use of the designated recipient(s) named above. If 
>>>> the reader of this message is not the intended recipient, you are hereby 
>>>> notified that you have received this message in error and that any review, 
>>>> dissemination, distribution, or copying of this message is strictly 
>>>> prohibited. If you have received this communication in error, please 
>>>> notify the sender by telephone or e-mail (as shown above) immediately and 
>>>> destroy any and all copies of this message in your possession (whether 
>>>> hard copies or electronically stored copies).
>>> 
>>>> 
>>> 
>>>> _______________________________________________
>>> 
>>>> ceph-users mailing list
>>> 
>>>> [email protected]
>>> 
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> --
>>> 
>>> Best Regards,
>>> 
>>> 
>>> 
>>> Wheat
>>> 
>>> 
>>> _______________________________________________
>>> ceph-users mailing list
>>> [email protected]
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> ceph-users mailing list
>>> [email protected]
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
>> 
>> Cheers.
>> –––– 
>> Sébastien Han 
>> Cloud Architect 
>> 
>> "Always give 100%. Unless you're giving blood."
>> 
>> Phone: +33 (0)1 49 70 99 72 
>> Mail: [email protected] 
>> Address : 11 bis, rue Roquépine - 75008 Paris
>> Web : www.enovance.com - Twitter : @enovance 
>> 
>> _______________________________________________
>> ceph-users mailing list
>> [email protected]
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> Cheers.
> –––– 
> Sébastien Han 
> Cloud Architect 
> 
> "Always give 100%. Unless you're giving blood."
> 
> Phone: +33 (0)1 49 70 99 72 
> Mail: [email protected] 
> Address : 11 bis, rue Roquépine - 75008 Paris
> Web : www.enovance.com - Twitter : @enovance 
> 
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 2K IOPS

Reply via email to