Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 2K IOPS

Sebastien Han Tue, 02 Sep 2014 06:25:58 -0700

Well the last time I ran two processes in parallel I got half the total amount 
available so 1,7k per client.


On 02 Sep 2014, at 15:19, Alexandre DERUMIER <[email protected]> wrote:

> 
> Do you have same results, if you launch 2 fio benchs in parallel on 2 
> differents rbd volumes ?
> 
> 
> ----- Mail original -----
> 
> De: "Sebastien Han" <[email protected]>
> À: "Cédric Lemarchand" <[email protected]>
> Cc: "Alexandre DERUMIER" <[email protected]>, [email protected]
> Envoyé: Mardi 2 Septembre 2014 13:59:13
> Objet: Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 2K 
> IOPS
> 
> @Dan, hop my bad I forgot to use these settings, I’ll try again and see how 
> much I can get on the read performance side.
> @Mark, thanks again and yes I believe that due to some hardware variance we 
> have difference results, I won’t say that the deviance is decent but results 
> are close enough to say that we experience the same limitations (ceph level).
> @Cédric, yes I did and what fio was showing was consistent with the iostat 
> output, same goes for disk utilisation.
> 
> 
> On 02 Sep 2014, at 12:44, Cédric Lemarchand <[email protected]> wrote:
> 
>> Hi Sebastian,
>> 
>>> Le 2 sept. 2014 à 10:41, Sebastien Han <[email protected]> a écrit 
>>> :
>>> 
>>> Hey,
>>> 
>>> Well I ran an fio job that simulates the (more or less) what ceph is doing 
>>> (journal writes with dsync and o_direct) and the ssd gave me 29K IOPS too.
>>> I could do this, but for me it definitely looks like a major waste since we 
>>> don’t even get a third of the ssd performance.
>> 
>> Did you had a look if the raw ssd IOPS (using iostat -x for example) show 
>> same results during fio bench ?
>> 
>> Cheers
>> 
>>> 
>>>> On 02 Sep 2014, at 09:38, Alexandre DERUMIER <[email protected]> wrote:
>>>> 
>>>> Hi Sebastien,
>>>> 
>>>>>> I got 6340 IOPS on a single OSD SSD. (journal and data on the same 
>>>>>> partition).
>>>> 
>>>> Shouldn't it better to have 2 partitions, 1 for journal and 1 for datas ?
>>>> 
>>>> (I'm thinking about filesystem write syncs)
>>>> 
>>>> 
>>>> 
>>>> 
>>>> ----- Mail original -----
>>>> 
>>>> De: "Sebastien Han" <[email protected]>
>>>> À: "Somnath Roy" <[email protected]>
>>>> Cc: [email protected]
>>>> Envoyé: Mardi 2 Septembre 2014 02:19:16
>>>> Objet: Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 
>>>> 2K IOPS
>>>> 
>>>> Mark and all, Ceph IOPS performance has definitely improved with Giant.
>>>> With this version: ceph version 0.84-940-g3215c52 
>>>> (3215c520e1306f50d0094b5646636c02456c9df4) on Debian 7.6 with Kernel 
>>>> 3.14-0.
>>>> 
>>>> I got 6340 IOPS on a single OSD SSD. (journal and data on the same 
>>>> partition).
>>>> So basically twice the amount of IOPS that I was getting with Firefly. 
>>>> 
>>>> Rand reads 4k went from 12431 to 10201, so I’m a bit disappointed here.
>>>> 
>>>> The SSD is still under-utilised:
>>>> 
>>>> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await 
>>>> w_await svctm %util
>>>> sdp1 0.00 540.37 0.00 5902.30 0.00 47.14 16.36 0.87 0.15 0.00 0.15 0.07 
>>>> 40.15
>>>> sdp2 0.00 0.00 0.00 4454.67 0.00 49.16 22.60 0.31 0.07 0.00 0.07 0.07 30.61
>>>> 
>>>> Thanks a ton for all your comments and assistance guys :).
>>>> 
>>>> One last question for Sage (or other that might know), what’s the status 
>>>> of the S2FS implementation? (or maybe we are waiting for S2FS to provide 
>>>> atomic transactions?)
>>>> I tried to run the OSD on f2fs however ceph-osd mkfs got stuck on a xattr 
>>>> test:
>>>> 
>>>> fremovexattr(10, "user.test@5848273") = 0
>>>> 
>>>>> On 01 Sep 2014, at 11:13, Sebastien Han <[email protected]> 
>>>>> wrote:
>>>>> 
>>>>> Mark, thanks a lot for experimenting this for me.
>>>>> I’m gonna try master soon and will tell you how much I can get.
>>>>> 
>>>>> It’s interesting to see that using 2 SSDs brings up more performance, 
>>>>> even both SSDs are under-utilized…
>>>>> They should be able to sustain both loads at the same time (journal and 
>>>>> osd data).
>>>>> 
>>>>>> On 01 Sep 2014, at 09:51, Somnath Roy <[email protected]> wrote:
>>>>>> 
>>>>>> As I said, 107K with IOs serving from memory, not hitting the disk.. 
>>>>>> 
>>>>>> From: Jian Zhang [mailto:[email protected]]
>>>>>> Sent: Sunday, August 31, 2014 8:54 PM
>>>>>> To: Somnath Roy
>>>>>> Cc: Haomai Wang; [email protected]
>>>>>> Subject: Re: [ceph-users] [Single OSD performance on SSD] Can't go over 
>>>>>> 3, 2K IOPS
>>>>>> 
>>>>>> Somnath,
>>>>>> on the small workload performance, 107k is higher than the theoretical 
>>>>>> IOPS of 520, any idea why?
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>>> Single client is ~14K iops, but scaling as number of clients 
>>>>>>>> increases. 10 clients ~107K iops. ~25 cpu cores are used.
>>>>>> 
>>>>>> 
>>>>>> 2014-09-01 11:52 GMT+08:00 Jian Zhang <[email protected]>:
>>>>>> Somnath,
>>>>>> on the small workload performance,
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 2014-08-29 14:37 GMT+08:00 Somnath Roy <[email protected]>:
>>>>>> 
>>>>>> Thanks Haomai !
>>>>>> 
>>>>>> Here is some of the data from my setup.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>>>> 
>>>>>> Set up:
>>>>>> 
>>>>>> --------
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 32 core cpu with HT enabled, 128 GB RAM, one SSD (both journal and data) 
>>>>>> -> one OSD. 5 client m/c with 12 core cpu and each running two instances 
>>>>>> of ceph_smalliobench (10 clients total). Network is 10GbE.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Workload:
>>>>>> 
>>>>>> -------------
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Small workload – 20K objects with 4K size and io_size is also 4K RR. The 
>>>>>> intent is to serve the ios from memory so that it can uncover the 
>>>>>> performance problems within single OSD.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Results from Firefly:
>>>>>> 
>>>>>> --------------------------
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Single client throughput is ~14K iops, but as the number of client 
>>>>>> increases the aggregated throughput is not increasing. 10 clients ~15K 
>>>>>> iops. ~9-10 cpu cores are used.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Result with latest master:
>>>>>> 
>>>>>> ------------------------------
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Single client is ~14K iops, but scaling as number of clients increases. 
>>>>>> 10 clients ~107K iops. ~25 cpu cores are used.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> More realistic workload:
>>>>>> 
>>>>>> -----------------------------
>>>>>> 
>>>>>> Let’s see how it is performing while > 90% of the ios are served from 
>>>>>> disks
>>>>>> 
>>>>>> Setup:
>>>>>> 
>>>>>> -------
>>>>>> 
>>>>>> 40 cpu core server as a cluster node (single node cluster) with 64 GB 
>>>>>> RAM. 8 SSDs -> 8 OSDs. One similar node for monitor and rgw. Another 
>>>>>> node for client running fio/vdbench. 4 rbds are configured with 
>>>>>> ‘noshare’ option. 40 GbE network
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Workload:
>>>>>> 
>>>>>> ------------
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 8 SSDs are populated , so, 8 * 800GB = ~6.4 TB of data. Io_size = 4K RR.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Results from Firefly:
>>>>>> 
>>>>>> ------------------------
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Aggregated output while 4 rbd clients stressing the cluster in parallel 
>>>>>> is ~20-25K IOPS , cpu cores used ~8-10 cores (may be less can’t remember 
>>>>>> precisely)
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Results from latest master:
>>>>>> 
>>>>>> --------------------------------
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Aggregated output while 4 rbd clients stressing the cluster in parallel 
>>>>>> is ~120K IOPS , cpu is 7% idle i.e ~37-38 cpu cores.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Hope this helps.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Thanks & Regards
>>>>>> 
>>>>>> Somnath
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> -----Original Message-----
>>>>>> From: Haomai Wang [mailto:[email protected]]
>>>>>> Sent: Thursday, August 28, 2014 8:01 PM
>>>>>> To: Somnath Roy
>>>>>> Cc: Andrey Korolyov; [email protected]
>>>>>> Subject: Re: [ceph-users] [Single OSD performance on SSD] Can't go over 
>>>>>> 3, 2K IOPS
>>>>>> 
>>>>>> 
>>>>>> Hi Roy,
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> I already scan your merged codes about "fdcache" and "optimizing for 
>>>>>> lfn_find/lfn_open", could you give some performance improvement data 
>>>>>> about it? I fully agree with your orientation, do you have any update 
>>>>>> about it?
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> As for messenger level, I have some very early works on 
>>>>>> it(https://github.com/yuyuyu101/ceph/tree/msg-event), it contains a new 
>>>>>> messenger implementation which support different event mechanism.
>>>>>> 
>>>>>> It looks like at least one more week to make it work.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On Fri, Aug 29, 2014 at 5:48 AM, Somnath Roy <[email protected]> 
>>>>>>> wrote:
>>>>>>> 
>>>>>>> Yes, what I saw the messenger level bottleneck is still huge !
>>>>>> 
>>>>>>> Hopefully RDMA messenger will resolve that and the performance gain 
>>>>>>> will be significant for Read (on SSDs). For write we need to uncover 
>>>>>>> the OSD bottlenecks first to take advantage of the improved upstream.
>>>>>> 
>>>>>>> What I experienced that till you remove the very last bottleneck the 
>>>>>>> performance improvement will not be visible and that could be confusing 
>>>>>>> because you might think that the upstream improvement you did is not 
>>>>>>> valid (which is not).
>>>>>> 
>>>>>> 
>>>>>>> Thanks & Regards
>>>>>> 
>>>>>>> Somnath
>>>>>> 
>>>>>>> -----Original Message-----
>>>>>> 
>>>>>>> From: Andrey Korolyov [mailto:[email protected]]
>>>>>> 
>>>>>>> Sent: Thursday, August 28, 2014 12:57 PM
>>>>>> 
>>>>>>> To: Somnath Roy
>>>>>> 
>>>>>>> Cc: David Moreau Simard; Mark Nelson; [email protected]
>>>>>> 
>>>>>>> Subject: Re: [ceph-users] [Single OSD performance on SSD] Can't go
>>>>>> 
>>>>>>> over 3, 2K IOPS
>>>>>> 
>>>>>> 
>>>>>>> On Thu, Aug 28, 2014 at 10:48 PM, Somnath Roy <[email protected]> 
>>>>>>> wrote:
>>>>>> 
>>>>>>>> Nope, this will not be back ported to Firefly I guess.
>>>>>> 
>>>>>> 
>>>>>>>> Thanks & Regards
>>>>>> 
>>>>>>>> Somnath
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> Thanks for sharing this, the first thing in thought when I looked at
>>>>>> 
>>>>>>> this thread, was your patches :)
>>>>>> 
>>>>>> 
>>>>>>> If Giant will incorporate them, both the RDMA support and those should 
>>>>>>> give a huge performance boost for RDMA-enabled Ceph backnets.
>>>>>> 
>>>>>> 
>>>>>>> ________________________________
>>>>>> 
>>>>>> 
>>>>>>> PLEASE NOTE: The information contained in this electronic mail message 
>>>>>>> is intended only for the use of the designated recipient(s) named 
>>>>>>> above. If the reader of this message is not the intended recipient, you 
>>>>>>> are hereby notified that you have received this message in error and 
>>>>>>> that any review, dissemination, distribution, or copying of this 
>>>>>>> message is strictly prohibited. If you have received this communication 
>>>>>>> in error, please notify the sender by telephone or e-mail (as shown 
>>>>>>> above) immediately and destroy any and all copies of this message in 
>>>>>>> your possession (whether hard copies or electronically stored copies).
>>>>>> 
>>>>>> 
>>>>>>> _______________________________________________
>>>>>> 
>>>>>>> ceph-users mailing list
>>>>>> 
>>>>>>> [email protected]
>>>>>> 
>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> 
>>>>>> Best Regards,
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Wheat
>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> ceph-users mailing list
>>>>>> [email protected]
>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> ceph-users mailing list
>>>>>> [email protected]
>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>> 
>>>>> 
>>>>> Cheers.
>>>>> ––––
>>>>> Sébastien Han
>>>>> Cloud Architect
>>>>> 
>>>>> "Always give 100%. Unless you're giving blood."
>>>>> 
>>>>> Phone: +33 (0)1 49 70 99 72
>>>>> Mail: [email protected]
>>>>> Address : 11 bis, rue Roquépine - 75008 Paris
>>>>> Web : www.enovance.com - Twitter : @enovance
>>>>> 
>>>>> _______________________________________________
>>>>> ceph-users mailing list
>>>>> [email protected]
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>> 
>>>> 
>>>> Cheers.
>>>> ––––
>>>> Sébastien Han
>>>> Cloud Architect
>>>> 
>>>> "Always give 100%. Unless you're giving blood."
>>>> 
>>>> Phone: +33 (0)1 49 70 99 72
>>>> Mail: [email protected]
>>>> Address : 11 bis, rue Roquépine - 75008 Paris
>>>> Web : www.enovance.com - Twitter : @enovance
>>>> 
>>>> 
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> [email protected]
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> 
>>> 
>>> Cheers.
>>> ––––
>>> Sébastien Han
>>> Cloud Architect
>>> 
>>> "Always give 100%. Unless you're giving blood."
>>> 
>>> Phone: +33 (0)1 49 70 99 72
>>> Mail: [email protected]
>>> Address : 11 bis, rue Roquépine - 75008 Paris
>>> Web : www.enovance.com - Twitter : @enovance
>>> 
>>> _______________________________________________
>>> ceph-users mailing list
>>> [email protected]
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> 
> 
> 
> Cheers.
> ––––
> Sébastien Han
> Cloud Architect
> 
> "Always give 100%. Unless you're giving blood."
> 
> Phone: +33 (0)1 49 70 99 72
> Mail: [email protected]
> Address : 11 bis, rue Roquépine - 75008 Paris
> Web : www.enovance.com - Twitter : @enovance


Cheers.
–––– 
Sébastien Han 
Cloud Architect 

"Always give 100%. Unless you're giving blood."

Phone: +33 (0)1 49 70 99 72 
Mail: [email protected] 
Address : 11 bis, rue Roquépine - 75008 Paris
Web : www.enovance.com - Twitter : @enovance

signature.asc
Description: Message signed with OpenPGP using GPGMail

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 2K IOPS

Reply via email to