Re: [ceph-users] Ceph performance with 8K blocks.

Jason Villalta Tue, 17 Sep 2013 15:27:46 -0700

So what I am gleaming from this is it better to have more than 3 ODSs since
the OSD seems to add additional processing overhead when using small blocks.


I will try to do some more testing by using the same three disks but with 6
or more OSDs.

If the OSD has is limited by processing is it safe to say it would make
sense to just use SSD for the journal and a spindel disk for data and read.


On Tue, Sep 17, 2013 at 5:12 PM, Jason Villalta <[email protected]> wrote:

> Here are the results:
>
> dd of=ddbenchfile if=/dev/zero bs=8K count=1000000 oflag=dsync
> 8192000000 bytes (8.2 GB) copied, 266.873 s, 30.7 MB/s
>
>
>
>
> On Tue, Sep 17, 2013 at 5:03 PM, Gregory Farnum <[email protected]> wrote:
>
>> Try it with oflag=dsync instead? I'm curious what kind of variation
>> these disks will provide.
>>
>> Anyway, you're not going to get the same kind of performance with
>> RADOS on 8k sync IO that you will with a local FS. It needs to
>> traverse the network and go through work queues in the daemon; your
>> primary limiter here is probably the per-request latency that you're
>> seeing (average ~30 ms, looking at the rados bench results). The good
>> news is that means you should be able to scale out to a lot of
>> clients, and if you don't force those 8k sync IOs (which RBD won't,
>> unless the application asks for them by itself using directIO or
>> frequent fsync or whatever) your performance will go way up.
>> -Greg
>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>>
>>
>> On Tue, Sep 17, 2013 at 1:47 PM, Jason Villalta <[email protected]>
>> wrote:
>> >
>> > Here are the stats with direct io.
>> >
>> > dd of=ddbenchfile if=/dev/zero bs=8K count=1000000 oflag=direct
>> > 8192000000 bytes (8.2 GB) copied, 68.4789 s, 120 MB/s
>> >
>> > dd if=ddbenchfile of=/dev/null bs=8K
>> > 8192000000 bytes (8.2 GB) copied, 19.7318 s, 415 MB/s
>> >
>> > These numbers are still over all much faster than when using RADOS
>> bench.
>> > The replica is set to 2.  The Journals are on the same disk but
>> separate partitions.
>> >
>> > I kept the block size the same 8K.
>> >
>> >
>> >
>> >
>> > On Tue, Sep 17, 2013 at 11:37 AM, Campbell, Bill <
>> [email protected]> wrote:
>> >>
>> >> As Gregory mentioned, your 'dd' test looks to be reading from the
>> cache (you are writing 8GB in, and then reading that 8GB out, so the reads
>> are all cached reads) so the performance is going to seem good.  You can
>> add the 'oflag=direct' to your dd test to try and get a more accurate
>> reading from that.
>> >>
>> >> RADOS performance from what I've seen is largely going to hinge on
>> replica size and journal location.  Are your journals on separate disks or
>> on the same disk as the OSD?  What is the replica size of your pool?
>> >>
>> >> ________________________________
>> >> From: "Jason Villalta" <[email protected]>
>> >> To: "Bill Campbell" <[email protected]>
>> >> Cc: "Gregory Farnum" <[email protected]>, "ceph-users" <
>> [email protected]>
>> >> Sent: Tuesday, September 17, 2013 11:31:43 AM
>> >>
>> >> Subject: Re: [ceph-users] Ceph performance with 8K blocks.
>> >>
>> >> Thanks for you feed back it is helpful.
>> >>
>> >> I may have been wrong about the default windows block size.  What
>> would be the best tests to compare native performance of the SSD disks at
>> 4K blocks vs Ceph performance with 4K blocks?  It just seems their is a
>> huge difference in the results.
>> >>
>> >>
>> >> On Tue, Sep 17, 2013 at 10:56 AM, Campbell, Bill <
>> [email protected]> wrote:
>> >>>
>> >>> Windows default (NTFS) is a 4k block.  Are you changing the
>> allocation unit to 8k as a default for your configuration?
>> >>>
>> >>> ________________________________
>> >>> From: "Gregory Farnum" <[email protected]>
>> >>> To: "Jason Villalta" <[email protected]>
>> >>> Cc: [email protected]
>> >>> Sent: Tuesday, September 17, 2013 10:40:09 AM
>> >>> Subject: Re: [ceph-users] Ceph performance with 8K blocks.
>> >>>
>> >>>
>> >>> Your 8k-block dd test is not nearly the same as your 8k-block rados
>> bench or SQL tests. Both rados bench and SQL require the write to be
>> committed to disk before moving on to the next one; dd is simply writing
>> into the page cache. So you're not going to get 460 or even 273MB/s with
>> sync 8k writes regardless of your settings.
>> >>>
>> >>> However, I think you should be able to tune your OSDs into somewhat
>> better numbers -- that rados bench is giving you ~300IOPs on every OSD
>> (with a small pipeline!), and an SSD-based daemon should be going faster.
>> What kind of logging are you running with and what configs have you set?
>> >>>
>> >>> Hopefully you can get Mark or Sam or somebody who's done some
>> performance tuning to offer some tips as well. :)
>> >>> -Greg
>> >>>
>> >>> On Tuesday, September 17, 2013, Jason Villalta wrote:
>> >>>>
>> >>>> Hello all,
>> >>>> I am new to the list.
>> >>>>
>> >>>> I have a single machines setup for testing Ceph.  It has a dual proc
>> 6 cores(12core total) for CPU and 128GB of RAM.  I also have 3 Intel 520
>> 240GB SSDs and an OSD setup on each disk with the OSD and Journal in
>> separate partitions formatted with ext4.
>> >>>>
>> >>>> My goal here is to prove just how fast Ceph can go and what kind of
>> performance to expect when using it as a back-end storage for virtual
>> machines mostly windows.  I would also like to try to understand how it
>> will scale IO by removing one disk of the three and doing the benchmark
>> tests.  But that is secondary.  So far here are my results.  I am aware
>> this is all sequential, I just want to know how fast it can go.
>> >>>>
>> >>>> DD IO test of SSD disks:  I am testing 8K blocks since that is the
>> default block size of windows.
>> >>>>  dd of=ddbenchfile if=/dev/zero bs=8K count=1000000
>> >>>> 8192000000 bytes (8.2 GB) copied, 17.7953 s, 460 MB/s
>> >>>>
>> >>>> dd if=ddbenchfile of=/dev/null bs=8K
>> >>>> 8192000000 bytes (8.2 GB) copied, 2.94287 s, 2.8 GB/s
>> >>>>
>> >>>> RADOS bench test with 3 SSD disks and 4MB object size(Default):
>> >>>> rados --no-cleanup bench -p pbench 30 write
>> >>>> Total writes made:      2061
>> >>>> Write size:             4194304
>> >>>> Bandwidth (MB/sec):     273.004
>> >>>>
>> >>>> Stddev Bandwidth:       67.5237
>> >>>> Max bandwidth (MB/sec): 352
>> >>>> Min bandwidth (MB/sec): 0
>> >>>> Average Latency:        0.234199
>> >>>> Stddev Latency:         0.130874
>> >>>> Max latency:            0.867119
>> >>>> Min latency:            0.039318
>> >>>> -----
>> >>>> rados bench -p pbench 30 seq
>> >>>> Total reads made:     2061
>> >>>> Read size:            4194304
>> >>>> Bandwidth (MB/sec):    956.466
>> >>>>
>> >>>> Average Latency:       0.0666347
>> >>>> Max latency:           0.208986
>> >>>> Min latency:           0.011625
>> >>>>
>> >>>> This all looks like I would expect from using three disks.  The
>> problems appear to come with the 8K blocks/object size.
>> >>>>
>> >>>> RADOS bench test with 3 SSD disks and 8K object size(8K blocks):
>> >>>> rados --no-cleanup bench -b 8192 -p pbench 30 write
>> >>>> Total writes made:      13770
>> >>>> Write size:             8192
>> >>>> Bandwidth (MB/sec):     3.581
>> >>>>
>> >>>> Stddev Bandwidth:       1.04405
>> >>>> Max bandwidth (MB/sec): 6.19531
>> >>>> Min bandwidth (MB/sec): 0
>> >>>> Average Latency:        0.0348977
>> >>>> Stddev Latency:         0.0349212
>> >>>> Max latency:            0.326429
>> >>>> Min latency:            0.0019
>> >>>> ------
>> >>>> rados bench -b 8192 -p pbench 30 seq
>> >>>> Total reads made:     13770
>> >>>> Read size:            8192
>> >>>> Bandwidth (MB/sec):    52.573
>> >>>>
>> >>>> Average Latency:       0.00237483
>> >>>> Max latency:           0.006783
>> >>>> Min latency:           0.000521
>> >>>>
>> >>>> So are these performance correct or is this something I missed with
>> the testing procedure?  The RADOS bench number with 8K block size are the
>> same we see when testing performance in an VM with SQLIO.  Does anyone know
>> of any configure changes that are needed to get the Ceph performance closer
>> to native performance with 8K blocks?
>> >>>>
>> >>>> Thanks in advance.
>> >>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> --
>> >>>> Jason Villalta
>> >>>> Co-founder
>> >>>> 800.799.4407x1230 | www.RubixTechnology.com
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>> >>>
>> >>> _______________________________________________
>> >>> ceph-users mailing list
>> >>> [email protected]
>> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >>>
>> >>>
>> >>> NOTICE: Protect the information in this message in accordance with
>> the company's security policies. If you received this message in error,
>> immediately notify the sender and destroy all copies.
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> --
>> >> Jason Villalta
>> >> Co-founder
>> >> 800.799.4407x1230 | www.RubixTechnology.com
>> >>
>> >>
>> >> NOTICE: Protect the information in this message in accordance with the
>> company's security policies. If you received this message in error,
>> immediately notify the sender and destroy all copies.
>> >>
>> >
>> >
>> >
>> > --
>> > --
>> > Jason Villalta
>> > Co-founder
>> > 800.799.4407x1230 | www.RubixTechnology.com
>>
>
>
>
> --
> --
> *Jason Villalta*
> Co-founder
> [image: Inline image 1]
> 800.799.4407x1230 | www.RubixTechnology.com<http://www.rubixtechnology.com/>
>



-- 
-- 
*Jason Villalta*
Co-founder
[image: Inline image 1]
800.799.4407x1230 | www.RubixTechnology.com<http://www.rubixtechnology.com/>

<<EmailLogo.png>>

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph performance with 8K blocks.

Reply via email to