Re: [ceph-users] Performance question

Bill Sanders Tue, 24 Nov 2015 11:21:01 -0800

I think what Nick is suggesting is that you create Nx5GB partitions on the
SSD's (where N is the number of OSD's you want to have fast journals for),
and use the rest of the space for OSDs that would form the SSD pool.


Bill

On Tue, Nov 24, 2015 at 10:56 AM, Marek Dohojda <
[email protected]> wrote:

> Oh, well in that you made my life easier, I like that :)
>
> I thought Journal needed to be on a physical space though, not within raw
> rbd pool.  Was I mistaken?
>
> On Tue, Nov 24, 2015 at 11:51 AM, Nick Fisk <[email protected]> wrote:
>
>> Ok, but it’s probably a bit of a waste. The journals for each disk will
>> probably require 200-300iops from each SSD and maybe 5GB of space.
>> Personally I would keep the SSD pool, maybe use it for high perf VM’s?
>>
>>
>>
>> Typically VM’s will generate more random smaller IO’s so a default rados
>> bench might not be a true example of expected performance.
>>
>>
>>
>> *From:* ceph-users [mailto:[email protected]] *On Behalf
>> Of *Marek Dohojda
>> *Sent:* 24 November 2015 18:47
>> *To:* Nick Fisk <[email protected]>
>>
>> *Cc:* [email protected]
>> *Subject:* Re: [ceph-users] Performance question
>>
>>
>>
>> I dunno, I think I just go into my Lotus and mull this over ;) (I wish)
>>
>> This is a storage for a KVM, and we have quite a few boxes.  While right
>> now none are suffering from IO load, I am seeing slowdown personally and
>> know that sooner or later others will notice as well.
>>
>>
>>
>> I think what I will do is remove the SSD from the cluster, and put
>> journals on it.
>>
>>
>>
>> On Tue, Nov 24, 2015 at 11:42 AM, Nick Fisk <[email protected]> wrote:
>>
>> Separate would be best, but as with many things in life we are not all
>> driving around in sports cars!!
>>
>>
>>
>> Moving the journals to the SSD’s that are also OSD’s themselves will be
>> fine. SSD’s tend to be more bandwidth limited than IOPs and the reverse is
>> true for Disks, so you will get maybe 2x improvement for the disk pool and
>> you probably won’t even notice the impact on the SSD pool.
>>
>>
>>
>> Can I just ask what your workload will be? There maybe other things that
>> can be done.
>>
>>
>>
>> *From:* ceph-users [mailto:[email protected]] *On Behalf
>> Of *Marek Dohojda
>> *Sent:* 24 November 2015 18:32
>> *To:* Alan Johnson <[email protected]>
>> *Cc:* [email protected]; Nick Fisk <[email protected]>
>>
>>
>> *Subject:* Re: [ceph-users] Performance question
>>
>>
>>
>> Thank you! I will do that.  Would you suggest getting another SSD drive
>> or move the journal to the SSD OSD?
>>
>>
>>
>> (Sorry for a stupid question, if that is such).
>>
>>
>>
>> On Tue, Nov 24, 2015 at 11:25 AM, Alan Johnson <[email protected]>
>> wrote:
>>
>> Or separate the journals as this will bring the workload down on the
>> spinners to 3Xrather than 6X
>>
>>
>>
>> *From:* Marek Dohojda [mailto:[email protected]]
>> *Sent:* Tuesday, November 24, 2015 1:24 PM
>> *To:* Nick Fisk
>> *Cc:* Alan Johnson; [email protected]
>>
>>
>> *Subject:* Re: [ceph-users] Performance question
>>
>>
>>
>> Crad I think you are 100% correct:
>>
>>
>>
>> rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz
>> await r_await w_await  svctm  %util
>>
>>
>>
>>  0.00   369.00   33.00 1405.00   132.00 135656.00   188.86     5.61
>>  4.02   21.94    3.60   0.70 100.00
>>
>>
>>
>> I was kinda wondering that this maybe the case, which is why I was
>> wondering if I should be doing too much in terms of troubleshooting.
>>
>>
>>
>> So basically what you are saying I need to wait for new version?
>>
>>
>>
>>
>>
>> Thank you very much everybody!
>>
>>
>>
>>
>>
>> On Tue, Nov 24, 2015 at 9:35 AM, Nick Fisk <[email protected]> wrote:
>>
>> You haven’t stated what size replication you are running. Keep in mind
>> that with a replication factor of 3, you will be writing 6x the amount of
>> data down to disks than what the benchmark says (3x replication x2 for
>> data+journal write).
>>
>>
>>
>> You might actually be near the hardware maximums. What does iostat looks
>> like whilst you are running rados bench, are the disks getting maxed out?
>>
>>
>>
>> *From:* ceph-users [mailto:[email protected]] *On Behalf
>> Of *Marek Dohojda
>> *Sent:* 24 November 2015 16:27
>> *To:* Alan Johnson <[email protected]>
>>
>>
>> *Cc:* [email protected]
>> *Subject:* Re: [ceph-users] Performance question
>>
>>
>>
>> 7 total servers, 20 GIG pipe between servers, both reads and writes.  The
>> network itself has plenty of pipe left, it is averaging 40Mbits/s
>>
>>
>>
>> Rados Bench SAS 30 writes
>>
>>  Total time run:         30.591927
>>
>> Total writes made:      386
>>
>> Write size:             4194304
>>
>> Bandwidth (MB/sec):     50.471
>>
>>
>>
>> Stddev Bandwidth:       48.1052
>>
>> Max bandwidth (MB/sec): 160
>>
>> Min bandwidth (MB/sec): 0
>>
>> Average Latency:        1.25908
>>
>> Stddev Latency:         2.62018
>>
>> Max latency:            21.2809
>>
>> Min latency:            0.029227
>>
>>
>>
>> Rados Bench SSD writes
>>
>>  Total time run:         20.425192
>>
>> Total writes made:      1405
>>
>> Write size:             4194304
>>
>> Bandwidth (MB/sec):     275.150
>>
>>
>>
>> Stddev Bandwidth:       122.565
>>
>> Max bandwidth (MB/sec): 576
>>
>> Min bandwidth (MB/sec): 0
>>
>> Average Latency:        0.231803
>>
>> Stddev Latency:         0.190978
>>
>> Max latency:            0.981022
>>
>> Min latency:            0.0265421
>>
>>
>>
>>
>>
>> As you can see SSD is better but not as much as I would expect SSD to be.
>>
>>
>>
>>
>>
>>
>>
>> On Tue, Nov 24, 2015 at 9:10 AM, Alan Johnson <[email protected]>
>> wrote:
>>
>> Hard to know without more config details such as no of servers, network
>>  – GigE or !0 GigE, also not sure how you are measuring, (reads or writes)
>> you could try RADOS bench as a baseline, I would expect more performance
>> with 7 X 10K spinners journaled to SSDs. The fact that SSDs did not perform
>> much better may mean to a bottleneck elsewhere – network perhaps?
>>
>> *From:* Marek Dohojda [mailto:[email protected]]
>> *Sent:* Tuesday, November 24, 2015 10:37 AM
>> *To:* Alan Johnson
>> *Cc:* Haomai Wang; [email protected]
>>
>>
>> *Subject:* Re: [ceph-users] Performance question
>>
>>
>>
>> Yeah they are, that is one thing I was planning on changing, What I am
>> really interested at the moment, is vague expected performance.  I mean is
>> 100MB around normal, very low, or "could be better"?
>>
>>
>>
>> On Tue, Nov 24, 2015 at 8:02 AM, Alan Johnson <[email protected]>
>> wrote:
>>
>> Are the journals on the same device – it might be better to use the SSDs
>> for journaling since you are not getting better performance with SSDs?
>>
>>
>>
>> *From:* ceph-users [mailto:[email protected]] *On Behalf
>> Of *Marek Dohojda
>> *Sent:* Monday, November 23, 2015 10:24 PM
>> *To:* Haomai Wang
>> *Cc:* [email protected]
>> *Subject:* Re: [ceph-users] Performance question
>>
>>
>>
>>  Sorry I should have specified SAS is the 100 MB :) , but to be honest
>> SSD isn't much faster.
>>
>>
>>
>> On Mon, Nov 23, 2015 at 7:38 PM, Haomai Wang <[email protected]>
>> wrote:
>>
>> On Tue, Nov 24, 2015 at 10:35 AM, Marek Dohojda
>> <[email protected]> wrote:
>> > No SSD and SAS are in two separate pools.
>> >
>> > On Mon, Nov 23, 2015 at 7:30 PM, Haomai Wang <[email protected]>
>> wrote:
>> >>
>> >> On Tue, Nov 24, 2015 at 10:23 AM, Marek Dohojda
>> >> <[email protected]> wrote:
>> >> > I have a Hammer Ceph cluster on 7 nodes with total 14 OSDs.  7 of
>> which
>> >> > are
>> >> > SSD and 7 of which are SAS 10K drives.  I get typically about 100MB
>> IO
>> >> > rates
>> >> > on this cluster.
>>
>> So which pool you get with 100 MB?
>>
>>
>> >>
>> >> You mixed up sas and ssd in one pool?
>> >>
>> >> >
>> >> > I have a simple question.  Is 100MB within my configuration what I
>> >> > should
>> >> > expect, or should it be higher? I am not sure if I should be looking
>> for
>> >> > issues, or just accept what I have.
>> >> >
>> >> > _______________________________________________
>> >> > ceph-users mailing list
>> >> > [email protected]
>>
>> >> >
>> http://xo4t.mj.am/link/xo4t/rslsxpz/1/3wKgDcrDtjRIz7sAUYjmWA/aHR0cDovL3hvNHQubWouYW0vbGluay94bzR0L3JzbHdsbXMvMS9CTUF1cXZUWmE5UHVEZ2VmRFB4bkR3L2FIUjBjRG92TDNodk5IUXViV291WVcwdmJHbHVheTk0YnpSMEwzSnplR3BwZERFdk1TOU9iRVZ4YUhWaE1uSlBTSGh0V0dScFQwTk1YM2RCTDJGSVVqQmpSRzkyVERKNGNHTXpVbnBNYlU1c1kwZG5kVmt5T1hSTU1uaHdZek5TY0dKdFduWk1iVTV1WVZNNWFscFlRbTlNV0ZaNldsaEtla3hYVG14alIyZDFXVEk1ZEE
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Best Regards,
>> >>
>> >> Wheat
>> >
>> >
>>
>> --
>> Best Regards,
>>
>> Wheat
>>
>>
>>
>>
>>
>>
>>
>>
>> [image: Image removed by sender.]
>>
>>
>>
>>
>>
>>
>> [image: Image removed by sender.]
>>
>>
>>
>>
>
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Performance question

Reply via email to