Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 2K IOPS

Alexandre DERUMIER Tue, 02 Sep 2014 06:19:44 -0700

Do you have same results, if you launch 2 fio benchs in parallel on 2 
differents rbd volumes ?



----- Mail original ----- 

De: "Sebastien Han" <[email protected]> 
À: "Cédric Lemarchand" <[email protected]> 
Cc: "Alexandre DERUMIER" <[email protected]>, [email protected] 
Envoyé: Mardi 2 Septembre 2014 13:59:13 
Objet: Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 2K 
IOPS 

@Dan, hop my bad I forgot to use these settings, I’ll try again and see how 
much I can get on the read performance side. 
@Mark, thanks again and yes I believe that due to some hardware variance we 
have difference results, I won’t say that the deviance is decent but results 
are close enough to say that we experience the same limitations (ceph level). 
@Cédric, yes I did and what fio was showing was consistent with the iostat 
output, same goes for disk utilisation. 


On 02 Sep 2014, at 12:44, Cédric Lemarchand <[email protected]> wrote: 

> Hi Sebastian, 
> 
>> Le 2 sept. 2014 à 10:41, Sebastien Han <[email protected]> a écrit 
>> : 
>> 
>> Hey, 
>> 
>> Well I ran an fio job that simulates the (more or less) what ceph is doing 
>> (journal writes with dsync and o_direct) and the ssd gave me 29K IOPS too. 
>> I could do this, but for me it definitely looks like a major waste since we 
>> don’t even get a third of the ssd performance. 
> 
> Did you had a look if the raw ssd IOPS (using iostat -x for example) show 
> same results during fio bench ? 
> 
> Cheers 
> 
>> 
>>> On 02 Sep 2014, at 09:38, Alexandre DERUMIER <[email protected]> wrote: 
>>> 
>>> Hi Sebastien, 
>>> 
>>>>> I got 6340 IOPS on a single OSD SSD. (journal and data on the same 
>>>>> partition). 
>>> 
>>> Shouldn't it better to have 2 partitions, 1 for journal and 1 for datas ? 
>>> 
>>> (I'm thinking about filesystem write syncs) 
>>> 
>>> 
>>> 
>>> 
>>> ----- Mail original ----- 
>>> 
>>> De: "Sebastien Han" <[email protected]> 
>>> À: "Somnath Roy" <[email protected]> 
>>> Cc: [email protected] 
>>> Envoyé: Mardi 2 Septembre 2014 02:19:16 
>>> Objet: Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 2K 
>>> IOPS 
>>> 
>>> Mark and all, Ceph IOPS performance has definitely improved with Giant. 
>>> With this version: ceph version 0.84-940-g3215c52 
>>> (3215c520e1306f50d0094b5646636c02456c9df4) on Debian 7.6 with Kernel 
>>> 3.14-0. 
>>> 
>>> I got 6340 IOPS on a single OSD SSD. (journal and data on the same 
>>> partition). 
>>> So basically twice the amount of IOPS that I was getting with Firefly. 
>>> 
>>> Rand reads 4k went from 12431 to 10201, so I’m a bit disappointed here. 
>>> 
>>> The SSD is still under-utilised: 
>>> 
>>> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await 
>>> w_await svctm %util 
>>> sdp1 0.00 540.37 0.00 5902.30 0.00 47.14 16.36 0.87 0.15 0.00 0.15 0.07 
>>> 40.15 
>>> sdp2 0.00 0.00 0.00 4454.67 0.00 49.16 22.60 0.31 0.07 0.00 0.07 0.07 30.61 
>>> 
>>> Thanks a ton for all your comments and assistance guys :). 
>>> 
>>> One last question for Sage (or other that might know), what’s the status of 
>>> the S2FS implementation? (or maybe we are waiting for S2FS to provide 
>>> atomic transactions?) 
>>> I tried to run the OSD on f2fs however ceph-osd mkfs got stuck on a xattr 
>>> test: 
>>> 
>>> fremovexattr(10, "user.test@5848273") = 0 
>>> 
>>>> On 01 Sep 2014, at 11:13, Sebastien Han <[email protected]> 
>>>> wrote: 
>>>> 
>>>> Mark, thanks a lot for experimenting this for me. 
>>>> I’m gonna try master soon and will tell you how much I can get. 
>>>> 
>>>> It’s interesting to see that using 2 SSDs brings up more performance, even 
>>>> both SSDs are under-utilized… 
>>>> They should be able to sustain both loads at the same time (journal and 
>>>> osd data). 
>>>> 
>>>>> On 01 Sep 2014, at 09:51, Somnath Roy <[email protected]> wrote: 
>>>>> 
>>>>> As I said, 107K with IOs serving from memory, not hitting the disk.. 
>>>>> 
>>>>> From: Jian Zhang [mailto:[email protected]] 
>>>>> Sent: Sunday, August 31, 2014 8:54 PM 
>>>>> To: Somnath Roy 
>>>>> Cc: Haomai Wang; [email protected] 
>>>>> Subject: Re: [ceph-users] [Single OSD performance on SSD] Can't go over 
>>>>> 3, 2K IOPS 
>>>>> 
>>>>> Somnath, 
>>>>> on the small workload performance, 107k is higher than the theoretical 
>>>>> IOPS of 520, any idea why? 
>>>>> 
>>>>> 
>>>>> 
>>>>>>> Single client is ~14K iops, but scaling as number of clients increases. 
>>>>>>> 10 clients ~107K iops. ~25 cpu cores are used. 
>>>>> 
>>>>> 
>>>>> 2014-09-01 11:52 GMT+08:00 Jian Zhang <[email protected]>: 
>>>>> Somnath, 
>>>>> on the small workload performance, 
>>>>> 
>>>>> 
>>>>> 
>>>>> 2014-08-29 14:37 GMT+08:00 Somnath Roy <[email protected]>: 
>>>>> 
>>>>> Thanks Haomai ! 
>>>>> 
>>>>> Here is some of the data from my setup. 
>>>>> 
>>>>> 
>>>>> 
>>>>> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>>>  
>>>>> 
>>>>> Set up: 
>>>>> 
>>>>> -------- 
>>>>> 
>>>>> 
>>>>> 
>>>>> 32 core cpu with HT enabled, 128 GB RAM, one SSD (both journal and data) 
>>>>> -> one OSD. 5 client m/c with 12 core cpu and each running two instances 
>>>>> of ceph_smalliobench (10 clients total). Network is 10GbE. 
>>>>> 
>>>>> 
>>>>> 
>>>>> Workload: 
>>>>> 
>>>>> ------------- 
>>>>> 
>>>>> 
>>>>> 
>>>>> Small workload – 20K objects with 4K size and io_size is also 4K RR. The 
>>>>> intent is to serve the ios from memory so that it can uncover the 
>>>>> performance problems within single OSD. 
>>>>> 
>>>>> 
>>>>> 
>>>>> Results from Firefly: 
>>>>> 
>>>>> -------------------------- 
>>>>> 
>>>>> 
>>>>> 
>>>>> Single client throughput is ~14K iops, but as the number of client 
>>>>> increases the aggregated throughput is not increasing. 10 clients ~15K 
>>>>> iops. ~9-10 cpu cores are used. 
>>>>> 
>>>>> 
>>>>> 
>>>>> Result with latest master: 
>>>>> 
>>>>> ------------------------------ 
>>>>> 
>>>>> 
>>>>> 
>>>>> Single client is ~14K iops, but scaling as number of clients increases. 
>>>>> 10 clients ~107K iops. ~25 cpu cores are used. 
>>>>> 
>>>>> 
>>>>> 
>>>>> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>>>  
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> More realistic workload: 
>>>>> 
>>>>> ----------------------------- 
>>>>> 
>>>>> Let’s see how it is performing while > 90% of the ios are served from 
>>>>> disks 
>>>>> 
>>>>> Setup: 
>>>>> 
>>>>> ------- 
>>>>> 
>>>>> 40 cpu core server as a cluster node (single node cluster) with 64 GB 
>>>>> RAM. 8 SSDs -> 8 OSDs. One similar node for monitor and rgw. Another node 
>>>>> for client running fio/vdbench. 4 rbds are configured with ‘noshare’ 
>>>>> option. 40 GbE network 
>>>>> 
>>>>> 
>>>>> 
>>>>> Workload: 
>>>>> 
>>>>> ------------ 
>>>>> 
>>>>> 
>>>>> 
>>>>> 8 SSDs are populated , so, 8 * 800GB = ~6.4 TB of data. Io_size = 4K RR. 
>>>>> 
>>>>> 
>>>>> 
>>>>> Results from Firefly: 
>>>>> 
>>>>> ------------------------ 
>>>>> 
>>>>> 
>>>>> 
>>>>> Aggregated output while 4 rbd clients stressing the cluster in parallel 
>>>>> is ~20-25K IOPS , cpu cores used ~8-10 cores (may be less can’t remember 
>>>>> precisely) 
>>>>> 
>>>>> 
>>>>> 
>>>>> Results from latest master: 
>>>>> 
>>>>> -------------------------------- 
>>>>> 
>>>>> 
>>>>> 
>>>>> Aggregated output while 4 rbd clients stressing the cluster in parallel 
>>>>> is ~120K IOPS , cpu is 7% idle i.e ~37-38 cpu cores. 
>>>>> 
>>>>> 
>>>>> 
>>>>> Hope this helps. 
>>>>> 
>>>>> 
>>>>> 
>>>>> Thanks & Regards 
>>>>> 
>>>>> Somnath 
>>>>> 
>>>>> 
>>>>> 
>>>>> -----Original Message----- 
>>>>> From: Haomai Wang [mailto:[email protected]] 
>>>>> Sent: Thursday, August 28, 2014 8:01 PM 
>>>>> To: Somnath Roy 
>>>>> Cc: Andrey Korolyov; [email protected] 
>>>>> Subject: Re: [ceph-users] [Single OSD performance on SSD] Can't go over 
>>>>> 3, 2K IOPS 
>>>>> 
>>>>> 
>>>>> Hi Roy, 
>>>>> 
>>>>> 
>>>>> 
>>>>> I already scan your merged codes about "fdcache" and "optimizing for 
>>>>> lfn_find/lfn_open", could you give some performance improvement data 
>>>>> about it? I fully agree with your orientation, do you have any update 
>>>>> about it? 
>>>>> 
>>>>> 
>>>>> 
>>>>> As for messenger level, I have some very early works on 
>>>>> it(https://github.com/yuyuyu101/ceph/tree/msg-event), it contains a new 
>>>>> messenger implementation which support different event mechanism. 
>>>>> 
>>>>> It looks like at least one more week to make it work. 
>>>>> 
>>>>> 
>>>>> 
>>>>>> On Fri, Aug 29, 2014 at 5:48 AM, Somnath Roy <[email protected]> 
>>>>>> wrote: 
>>>>>> 
>>>>>> Yes, what I saw the messenger level bottleneck is still huge ! 
>>>>> 
>>>>>> Hopefully RDMA messenger will resolve that and the performance gain will 
>>>>>> be significant for Read (on SSDs). For write we need to uncover the OSD 
>>>>>> bottlenecks first to take advantage of the improved upstream. 
>>>>> 
>>>>>> What I experienced that till you remove the very last bottleneck the 
>>>>>> performance improvement will not be visible and that could be confusing 
>>>>>> because you might think that the upstream improvement you did is not 
>>>>>> valid (which is not). 
>>>>> 
>>>>> 
>>>>>> Thanks & Regards 
>>>>> 
>>>>>> Somnath 
>>>>> 
>>>>>> -----Original Message----- 
>>>>> 
>>>>>> From: Andrey Korolyov [mailto:[email protected]] 
>>>>> 
>>>>>> Sent: Thursday, August 28, 2014 12:57 PM 
>>>>> 
>>>>>> To: Somnath Roy 
>>>>> 
>>>>>> Cc: David Moreau Simard; Mark Nelson; [email protected] 
>>>>> 
>>>>>> Subject: Re: [ceph-users] [Single OSD performance on SSD] Can't go 
>>>>> 
>>>>>> over 3, 2K IOPS 
>>>>> 
>>>>> 
>>>>>> On Thu, Aug 28, 2014 at 10:48 PM, Somnath Roy <[email protected]> 
>>>>>> wrote: 
>>>>> 
>>>>>>> Nope, this will not be back ported to Firefly I guess. 
>>>>> 
>>>>> 
>>>>>>> Thanks & Regards 
>>>>> 
>>>>>>> Somnath 
>>>>> 
>>>>> 
>>>>> 
>>>>>> Thanks for sharing this, the first thing in thought when I looked at 
>>>>> 
>>>>>> this thread, was your patches :) 
>>>>> 
>>>>> 
>>>>>> If Giant will incorporate them, both the RDMA support and those should 
>>>>>> give a huge performance boost for RDMA-enabled Ceph backnets. 
>>>>> 
>>>>> 
>>>>>> ________________________________ 
>>>>> 
>>>>> 
>>>>>> PLEASE NOTE: The information contained in this electronic mail message 
>>>>>> is intended only for the use of the designated recipient(s) named above. 
>>>>>> If the reader of this message is not the intended recipient, you are 
>>>>>> hereby notified that you have received this message in error and that 
>>>>>> any review, dissemination, distribution, or copying of this message is 
>>>>>> strictly prohibited. If you have received this communication in error, 
>>>>>> please notify the sender by telephone or e-mail (as shown above) 
>>>>>> immediately and destroy any and all copies of this message in your 
>>>>>> possession (whether hard copies or electronically stored copies). 
>>>>> 
>>>>> 
>>>>>> _______________________________________________ 
>>>>> 
>>>>>> ceph-users mailing list 
>>>>> 
>>>>>> [email protected] 
>>>>> 
>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> -- 
>>>>> 
>>>>> Best Regards, 
>>>>> 
>>>>> 
>>>>> 
>>>>> Wheat 
>>>>> 
>>>>> 
>>>>> _______________________________________________ 
>>>>> ceph-users mailing list 
>>>>> [email protected] 
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>>>>> 
>>>>> 
>>>>> 
>>>>> _______________________________________________ 
>>>>> ceph-users mailing list 
>>>>> [email protected] 
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>>>> 
>>>> 
>>>> Cheers. 
>>>> –––– 
>>>> Sébastien Han 
>>>> Cloud Architect 
>>>> 
>>>> "Always give 100%. Unless you're giving blood." 
>>>> 
>>>> Phone: +33 (0)1 49 70 99 72 
>>>> Mail: [email protected] 
>>>> Address : 11 bis, rue Roquépine - 75008 Paris 
>>>> Web : www.enovance.com - Twitter : @enovance 
>>>> 
>>>> _______________________________________________ 
>>>> ceph-users mailing list 
>>>> [email protected] 
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>>> 
>>> 
>>> Cheers. 
>>> –––– 
>>> Sébastien Han 
>>> Cloud Architect 
>>> 
>>> "Always give 100%. Unless you're giving blood." 
>>> 
>>> Phone: +33 (0)1 49 70 99 72 
>>> Mail: [email protected] 
>>> Address : 11 bis, rue Roquépine - 75008 Paris 
>>> Web : www.enovance.com - Twitter : @enovance 
>>> 
>>> 
>>> _______________________________________________ 
>>> ceph-users mailing list 
>>> [email protected] 
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>> 
>> 
>> Cheers. 
>> –––– 
>> Sébastien Han 
>> Cloud Architect 
>> 
>> "Always give 100%. Unless you're giving blood." 
>> 
>> Phone: +33 (0)1 49 70 99 72 
>> Mail: [email protected] 
>> Address : 11 bis, rue Roquépine - 75008 Paris 
>> Web : www.enovance.com - Twitter : @enovance 
>> 
>> _______________________________________________ 
>> ceph-users mailing list 
>> [email protected] 
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>> 


Cheers. 
–––– 
Sébastien Han 
Cloud Architect 

"Always give 100%. Unless you're giving blood." 

Phone: +33 (0)1 49 70 99 72 
Mail: [email protected] 
Address : 11 bis, rue Roquépine - 75008 Paris 
Web : www.enovance.com - Twitter : @enovance 
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 2K IOPS

Reply via email to