So what I am gleaming from this is it better to have more than 3 ODSs since the OSD seems to add additional processing overhead when using small blocks.
I will try to do some more testing by using the same three disks but with 6 or more OSDs. If the OSD has is limited by processing is it safe to say it would make sense to just use SSD for the journal and a spindel disk for data and read. On Tue, Sep 17, 2013 at 5:12 PM, Jason Villalta <[email protected]> wrote: > Here are the results: > > dd of=ddbenchfile if=/dev/zero bs=8K count=1000000 oflag=dsync > 8192000000 bytes (8.2 GB) copied, 266.873 s, 30.7 MB/s > > > > > On Tue, Sep 17, 2013 at 5:03 PM, Gregory Farnum <[email protected]> wrote: > >> Try it with oflag=dsync instead? I'm curious what kind of variation >> these disks will provide. >> >> Anyway, you're not going to get the same kind of performance with >> RADOS on 8k sync IO that you will with a local FS. It needs to >> traverse the network and go through work queues in the daemon; your >> primary limiter here is probably the per-request latency that you're >> seeing (average ~30 ms, looking at the rados bench results). The good >> news is that means you should be able to scale out to a lot of >> clients, and if you don't force those 8k sync IOs (which RBD won't, >> unless the application asks for them by itself using directIO or >> frequent fsync or whatever) your performance will go way up. >> -Greg >> Software Engineer #42 @ http://inktank.com | http://ceph.com >> >> >> On Tue, Sep 17, 2013 at 1:47 PM, Jason Villalta <[email protected]> >> wrote: >> > >> > Here are the stats with direct io. >> > >> > dd of=ddbenchfile if=/dev/zero bs=8K count=1000000 oflag=direct >> > 8192000000 bytes (8.2 GB) copied, 68.4789 s, 120 MB/s >> > >> > dd if=ddbenchfile of=/dev/null bs=8K >> > 8192000000 bytes (8.2 GB) copied, 19.7318 s, 415 MB/s >> > >> > These numbers are still over all much faster than when using RADOS >> bench. >> > The replica is set to 2. The Journals are on the same disk but >> separate partitions. >> > >> > I kept the block size the same 8K. >> > >> > >> > >> > >> > On Tue, Sep 17, 2013 at 11:37 AM, Campbell, Bill < >> [email protected]> wrote: >> >> >> >> As Gregory mentioned, your 'dd' test looks to be reading from the >> cache (you are writing 8GB in, and then reading that 8GB out, so the reads >> are all cached reads) so the performance is going to seem good. You can >> add the 'oflag=direct' to your dd test to try and get a more accurate >> reading from that. >> >> >> >> RADOS performance from what I've seen is largely going to hinge on >> replica size and journal location. Are your journals on separate disks or >> on the same disk as the OSD? What is the replica size of your pool? >> >> >> >> ________________________________ >> >> From: "Jason Villalta" <[email protected]> >> >> To: "Bill Campbell" <[email protected]> >> >> Cc: "Gregory Farnum" <[email protected]>, "ceph-users" < >> [email protected]> >> >> Sent: Tuesday, September 17, 2013 11:31:43 AM >> >> >> >> Subject: Re: [ceph-users] Ceph performance with 8K blocks. >> >> >> >> Thanks for you feed back it is helpful. >> >> >> >> I may have been wrong about the default windows block size. What >> would be the best tests to compare native performance of the SSD disks at >> 4K blocks vs Ceph performance with 4K blocks? It just seems their is a >> huge difference in the results. >> >> >> >> >> >> On Tue, Sep 17, 2013 at 10:56 AM, Campbell, Bill < >> [email protected]> wrote: >> >>> >> >>> Windows default (NTFS) is a 4k block. Are you changing the >> allocation unit to 8k as a default for your configuration? >> >>> >> >>> ________________________________ >> >>> From: "Gregory Farnum" <[email protected]> >> >>> To: "Jason Villalta" <[email protected]> >> >>> Cc: [email protected] >> >>> Sent: Tuesday, September 17, 2013 10:40:09 AM >> >>> Subject: Re: [ceph-users] Ceph performance with 8K blocks. >> >>> >> >>> >> >>> Your 8k-block dd test is not nearly the same as your 8k-block rados >> bench or SQL tests. Both rados bench and SQL require the write to be >> committed to disk before moving on to the next one; dd is simply writing >> into the page cache. So you're not going to get 460 or even 273MB/s with >> sync 8k writes regardless of your settings. >> >>> >> >>> However, I think you should be able to tune your OSDs into somewhat >> better numbers -- that rados bench is giving you ~300IOPs on every OSD >> (with a small pipeline!), and an SSD-based daemon should be going faster. >> What kind of logging are you running with and what configs have you set? >> >>> >> >>> Hopefully you can get Mark or Sam or somebody who's done some >> performance tuning to offer some tips as well. :) >> >>> -Greg >> >>> >> >>> On Tuesday, September 17, 2013, Jason Villalta wrote: >> >>>> >> >>>> Hello all, >> >>>> I am new to the list. >> >>>> >> >>>> I have a single machines setup for testing Ceph. It has a dual proc >> 6 cores(12core total) for CPU and 128GB of RAM. I also have 3 Intel 520 >> 240GB SSDs and an OSD setup on each disk with the OSD and Journal in >> separate partitions formatted with ext4. >> >>>> >> >>>> My goal here is to prove just how fast Ceph can go and what kind of >> performance to expect when using it as a back-end storage for virtual >> machines mostly windows. I would also like to try to understand how it >> will scale IO by removing one disk of the three and doing the benchmark >> tests. But that is secondary. So far here are my results. I am aware >> this is all sequential, I just want to know how fast it can go. >> >>>> >> >>>> DD IO test of SSD disks: I am testing 8K blocks since that is the >> default block size of windows. >> >>>> dd of=ddbenchfile if=/dev/zero bs=8K count=1000000 >> >>>> 8192000000 bytes (8.2 GB) copied, 17.7953 s, 460 MB/s >> >>>> >> >>>> dd if=ddbenchfile of=/dev/null bs=8K >> >>>> 8192000000 bytes (8.2 GB) copied, 2.94287 s, 2.8 GB/s >> >>>> >> >>>> RADOS bench test with 3 SSD disks and 4MB object size(Default): >> >>>> rados --no-cleanup bench -p pbench 30 write >> >>>> Total writes made: 2061 >> >>>> Write size: 4194304 >> >>>> Bandwidth (MB/sec): 273.004 >> >>>> >> >>>> Stddev Bandwidth: 67.5237 >> >>>> Max bandwidth (MB/sec): 352 >> >>>> Min bandwidth (MB/sec): 0 >> >>>> Average Latency: 0.234199 >> >>>> Stddev Latency: 0.130874 >> >>>> Max latency: 0.867119 >> >>>> Min latency: 0.039318 >> >>>> ----- >> >>>> rados bench -p pbench 30 seq >> >>>> Total reads made: 2061 >> >>>> Read size: 4194304 >> >>>> Bandwidth (MB/sec): 956.466 >> >>>> >> >>>> Average Latency: 0.0666347 >> >>>> Max latency: 0.208986 >> >>>> Min latency: 0.011625 >> >>>> >> >>>> This all looks like I would expect from using three disks. The >> problems appear to come with the 8K blocks/object size. >> >>>> >> >>>> RADOS bench test with 3 SSD disks and 8K object size(8K blocks): >> >>>> rados --no-cleanup bench -b 8192 -p pbench 30 write >> >>>> Total writes made: 13770 >> >>>> Write size: 8192 >> >>>> Bandwidth (MB/sec): 3.581 >> >>>> >> >>>> Stddev Bandwidth: 1.04405 >> >>>> Max bandwidth (MB/sec): 6.19531 >> >>>> Min bandwidth (MB/sec): 0 >> >>>> Average Latency: 0.0348977 >> >>>> Stddev Latency: 0.0349212 >> >>>> Max latency: 0.326429 >> >>>> Min latency: 0.0019 >> >>>> ------ >> >>>> rados bench -b 8192 -p pbench 30 seq >> >>>> Total reads made: 13770 >> >>>> Read size: 8192 >> >>>> Bandwidth (MB/sec): 52.573 >> >>>> >> >>>> Average Latency: 0.00237483 >> >>>> Max latency: 0.006783 >> >>>> Min latency: 0.000521 >> >>>> >> >>>> So are these performance correct or is this something I missed with >> the testing procedure? The RADOS bench number with 8K block size are the >> same we see when testing performance in an VM with SQLIO. Does anyone know >> of any configure changes that are needed to get the Ceph performance closer >> to native performance with 8K blocks? >> >>>> >> >>>> Thanks in advance. >> >>>> >> >>>> >> >>>> >> >>>> -- >> >>>> -- >> >>>> Jason Villalta >> >>>> Co-founder >> >>>> 800.799.4407x1230 | www.RubixTechnology.com >> >>> >> >>> >> >>> >> >>> -- >> >>> Software Engineer #42 @ http://inktank.com | http://ceph.com >> >>> >> >>> _______________________________________________ >> >>> ceph-users mailing list >> >>> [email protected] >> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >>> >> >>> >> >>> NOTICE: Protect the information in this message in accordance with >> the company's security policies. If you received this message in error, >> immediately notify the sender and destroy all copies. >> >>> >> >> >> >> >> >> >> >> -- >> >> -- >> >> Jason Villalta >> >> Co-founder >> >> 800.799.4407x1230 | www.RubixTechnology.com >> >> >> >> >> >> NOTICE: Protect the information in this message in accordance with the >> company's security policies. If you received this message in error, >> immediately notify the sender and destroy all copies. >> >> >> > >> > >> > >> > -- >> > -- >> > Jason Villalta >> > Co-founder >> > 800.799.4407x1230 | www.RubixTechnology.com >> > > > > -- > -- > *Jason Villalta* > Co-founder > [image: Inline image 1] > 800.799.4407x1230 | www.RubixTechnology.com<http://www.rubixtechnology.com/> > -- -- *Jason Villalta* Co-founder [image: Inline image 1] 800.799.4407x1230 | www.RubixTechnology.com<http://www.rubixtechnology.com/>
<<EmailLogo.png>>
<<EmailLogo.png>>
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
