Re: [ceph-users] Ceph performance with 8K blocks.

Jason Villalta Tue, 17 Sep 2013 14:13:48 -0700

Here are the results:

dd of=ddbenchfile if=/dev/zero bs=8K count=1000000 oflag=dsync
8192000000 bytes (8.2 GB) copied, 266.873 s, 30.7 MB/s





On Tue, Sep 17, 2013 at 5:03 PM, Gregory Farnum <[email protected]> wrote:

> Try it with oflag=dsync instead? I'm curious what kind of variation
> these disks will provide.
>
> Anyway, you're not going to get the same kind of performance with
> RADOS on 8k sync IO that you will with a local FS. It needs to
> traverse the network and go through work queues in the daemon; your
> primary limiter here is probably the per-request latency that you're
> seeing (average ~30 ms, looking at the rados bench results). The good
> news is that means you should be able to scale out to a lot of
> clients, and if you don't force those 8k sync IOs (which RBD won't,
> unless the application asks for them by itself using directIO or
> frequent fsync or whatever) your performance will go way up.
> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
>
>
> On Tue, Sep 17, 2013 at 1:47 PM, Jason Villalta <[email protected]>
> wrote:
> >
> > Here are the stats with direct io.
> >
> > dd of=ddbenchfile if=/dev/zero bs=8K count=1000000 oflag=direct
> > 8192000000 bytes (8.2 GB) copied, 68.4789 s, 120 MB/s
> >
> > dd if=ddbenchfile of=/dev/null bs=8K
> > 8192000000 bytes (8.2 GB) copied, 19.7318 s, 415 MB/s
> >
> > These numbers are still over all much faster than when using RADOS bench.
> > The replica is set to 2.  The Journals are on the same disk but separate
> partitions.
> >
> > I kept the block size the same 8K.
> >
> >
> >
> >
> > On Tue, Sep 17, 2013 at 11:37 AM, Campbell, Bill <
> [email protected]> wrote:
> >>
> >> As Gregory mentioned, your 'dd' test looks to be reading from the cache
> (you are writing 8GB in, and then reading that 8GB out, so the reads are
> all cached reads) so the performance is going to seem good.  You can add
> the 'oflag=direct' to your dd test to try and get a more accurate reading
> from that.
> >>
> >> RADOS performance from what I've seen is largely going to hinge on
> replica size and journal location.  Are your journals on separate disks or
> on the same disk as the OSD?  What is the replica size of your pool?
> >>
> >> ________________________________
> >> From: "Jason Villalta" <[email protected]>
> >> To: "Bill Campbell" <[email protected]>
> >> Cc: "Gregory Farnum" <[email protected]>, "ceph-users" <
> [email protected]>
> >> Sent: Tuesday, September 17, 2013 11:31:43 AM
> >>
> >> Subject: Re: [ceph-users] Ceph performance with 8K blocks.
> >>
> >> Thanks for you feed back it is helpful.
> >>
> >> I may have been wrong about the default windows block size.  What would
> be the best tests to compare native performance of the SSD disks at 4K
> blocks vs Ceph performance with 4K blocks?  It just seems their is a huge
> difference in the results.
> >>
> >>
> >> On Tue, Sep 17, 2013 at 10:56 AM, Campbell, Bill <
> [email protected]> wrote:
> >>>
> >>> Windows default (NTFS) is a 4k block.  Are you changing the allocation
> unit to 8k as a default for your configuration?
> >>>
> >>> ________________________________
> >>> From: "Gregory Farnum" <[email protected]>
> >>> To: "Jason Villalta" <[email protected]>
> >>> Cc: [email protected]
> >>> Sent: Tuesday, September 17, 2013 10:40:09 AM
> >>> Subject: Re: [ceph-users] Ceph performance with 8K blocks.
> >>>
> >>>
> >>> Your 8k-block dd test is not nearly the same as your 8k-block rados
> bench or SQL tests. Both rados bench and SQL require the write to be
> committed to disk before moving on to the next one; dd is simply writing
> into the page cache. So you're not going to get 460 or even 273MB/s with
> sync 8k writes regardless of your settings.
> >>>
> >>> However, I think you should be able to tune your OSDs into somewhat
> better numbers -- that rados bench is giving you ~300IOPs on every OSD
> (with a small pipeline!), and an SSD-based daemon should be going faster.
> What kind of logging are you running with and what configs have you set?
> >>>
> >>> Hopefully you can get Mark or Sam or somebody who's done some
> performance tuning to offer some tips as well. :)
> >>> -Greg
> >>>
> >>> On Tuesday, September 17, 2013, Jason Villalta wrote:
> >>>>
> >>>> Hello all,
> >>>> I am new to the list.
> >>>>
> >>>> I have a single machines setup for testing Ceph.  It has a dual proc
> 6 cores(12core total) for CPU and 128GB of RAM.  I also have 3 Intel 520
> 240GB SSDs and an OSD setup on each disk with the OSD and Journal in
> separate partitions formatted with ext4.
> >>>>
> >>>> My goal here is to prove just how fast Ceph can go and what kind of
> performance to expect when using it as a back-end storage for virtual
> machines mostly windows.  I would also like to try to understand how it
> will scale IO by removing one disk of the three and doing the benchmark
> tests.  But that is secondary.  So far here are my results.  I am aware
> this is all sequential, I just want to know how fast it can go.
> >>>>
> >>>> DD IO test of SSD disks:  I am testing 8K blocks since that is the
> default block size of windows.
> >>>>  dd of=ddbenchfile if=/dev/zero bs=8K count=1000000
> >>>> 8192000000 bytes (8.2 GB) copied, 17.7953 s, 460 MB/s
> >>>>
> >>>> dd if=ddbenchfile of=/dev/null bs=8K
> >>>> 8192000000 bytes (8.2 GB) copied, 2.94287 s, 2.8 GB/s
> >>>>
> >>>> RADOS bench test with 3 SSD disks and 4MB object size(Default):
> >>>> rados --no-cleanup bench -p pbench 30 write
> >>>> Total writes made:      2061
> >>>> Write size:             4194304
> >>>> Bandwidth (MB/sec):     273.004
> >>>>
> >>>> Stddev Bandwidth:       67.5237
> >>>> Max bandwidth (MB/sec): 352
> >>>> Min bandwidth (MB/sec): 0
> >>>> Average Latency:        0.234199
> >>>> Stddev Latency:         0.130874
> >>>> Max latency:            0.867119
> >>>> Min latency:            0.039318
> >>>> -----
> >>>> rados bench -p pbench 30 seq
> >>>> Total reads made:     2061
> >>>> Read size:            4194304
> >>>> Bandwidth (MB/sec):    956.466
> >>>>
> >>>> Average Latency:       0.0666347
> >>>> Max latency:           0.208986
> >>>> Min latency:           0.011625
> >>>>
> >>>> This all looks like I would expect from using three disks.  The
> problems appear to come with the 8K blocks/object size.
> >>>>
> >>>> RADOS bench test with 3 SSD disks and 8K object size(8K blocks):
> >>>> rados --no-cleanup bench -b 8192 -p pbench 30 write
> >>>> Total writes made:      13770
> >>>> Write size:             8192
> >>>> Bandwidth (MB/sec):     3.581
> >>>>
> >>>> Stddev Bandwidth:       1.04405
> >>>> Max bandwidth (MB/sec): 6.19531
> >>>> Min bandwidth (MB/sec): 0
> >>>> Average Latency:        0.0348977
> >>>> Stddev Latency:         0.0349212
> >>>> Max latency:            0.326429
> >>>> Min latency:            0.0019
> >>>> ------
> >>>> rados bench -b 8192 -p pbench 30 seq
> >>>> Total reads made:     13770
> >>>> Read size:            8192
> >>>> Bandwidth (MB/sec):    52.573
> >>>>
> >>>> Average Latency:       0.00237483
> >>>> Max latency:           0.006783
> >>>> Min latency:           0.000521
> >>>>
> >>>> So are these performance correct or is this something I missed with
> the testing procedure?  The RADOS bench number with 8K block size are the
> same we see when testing performance in an VM with SQLIO.  Does anyone know
> of any configure changes that are needed to get the Ceph performance closer
> to native performance with 8K blocks?
> >>>>
> >>>> Thanks in advance.
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> --
> >>>> Jason Villalta
> >>>> Co-founder
> >>>> 800.799.4407x1230 | www.RubixTechnology.com
> >>>
> >>>
> >>>
> >>> --
> >>> Software Engineer #42 @ http://inktank.com | http://ceph.com
> >>>
> >>> _______________________________________________
> >>> ceph-users mailing list
> >>> [email protected]
> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>>
> >>>
> >>> NOTICE: Protect the information in this message in accordance with the
> company's security policies. If you received this message in error,
> immediately notify the sender and destroy all copies.
> >>>
> >>
> >>
> >>
> >> --
> >> --
> >> Jason Villalta
> >> Co-founder
> >> 800.799.4407x1230 | www.RubixTechnology.com
> >>
> >>
> >> NOTICE: Protect the information in this message in accordance with the
> company's security policies. If you received this message in error,
> immediately notify the sender and destroy all copies.
> >>
> >
> >
> >
> > --
> > --
> > Jason Villalta
> > Co-founder
> > 800.799.4407x1230 | www.RubixTechnology.com
>



-- 
-- 
*Jason Villalta*
Co-founder
[image: Inline image 1]
800.799.4407x1230 | www.RubixTechnology.com<http://www.rubixtechnology.com/>

<<EmailLogo.png>>

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph performance with 8K blocks.

Reply via email to