Re: [ceph-users] Performance doesn't scale well on a full ssd cluster.

Mark Wu Thu, 16 Oct 2014 10:20:08 -0700

forgot to cc the list
---------- 转发的邮件 ----------
发件人："Mark Wu" <[email protected]>
日期：2014 年 10 月 17 日 上午 12:51
主题：Re: [ceph-users] Performance doesn't scale well on a full ssd cluster.
收件人："Gregory Farnum" <[email protected]>
抄送：


Thanks for the reply. I am not using single client. Writing 5 rbd volumes
on 3 host can reach the peak. The client is fio and also running on osd
nodes. But there're no bottlenecks on cpu or network. I also tried running
client on two non osd servers, but the same result.
2014 年 10 月 17 日 上午 12:29于 "Gregory Farnum" <[email protected]>写道：

> If you're running a single client to drive these tests, that's your
> bottleneck. Try running multiple clients and aggregating their numbers.
> -Greg
>
> On Thursday, October 16, 2014, Mark Wu <[email protected]> wrote:
>
>> Hi list,
>>
>> During my test, I found ceph doesn't scale as I expected on a 30 osds
>> cluster.
>> The following is the information of my setup:
>> HW configuration:
>>    15 Dell R720 servers, and each server has:
>>       Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz, 20 cores and
>> hyper-thread enabled.
>>       128GB memory
>>       two Intel 3500 SSD disks, connected with MegaRAID SAS 2208
>> controller, each disk is configured as raid0 separately.
>>       bonding with two 10GbE nics, used for both the public network and
>> cluster network.
>>
>> SW configuration:
>>    OS CentOS 6.5, Kernel 3.17,  Ceph 0.86
>>    XFS as file system for data.
>>    each SSD disk has two partitions, one is osd data and the other is osd
>> journal.
>>    the pool has 2048 pgs. 2 replicas.
>>    5 monitors running on 5 of the 15 servers.
>>    Ceph configuration (in memory debugging options are disabled)
>>
>> [osd]
>> osd data = /var/lib/ceph/osd/$cluster-$id
>> osd journal = /var/lib/ceph/osd/$cluster-$id/journal
>> osd mkfs type = xfs
>> osd mkfs options xfs = -f -i size=2048
>> osd mount options xfs = rw,noatime,logbsize=256k,delaylog
>> osd journal size = 20480
>> osd mon heartbeat interval = 30 # Performance tuning filestore
>> osd_max_backfills = 10
>> osd_recovery_max_active = 15
>> merge threshold = 40
>> filestore split multiple = 8
>> filestore fd cache size = 1024
>> osd op threads = 64 # Recovery tuning osd recovery max active = 1 osd max
>> backfills = 1
>> osd recovery op priority = 1
>> throttler perf counter = false
>> osd enable op tracker = false
>> filestore_queue_max_ops = 5000
>> filestore_queue_committing_max_ops = 5000
>> journal_max_write_entries = 1000
>> journal_queue_max_ops = 5000
>> objecter_inflight_ops = 8192
>>
>>
>>   When I test with 7 servers (14 osds),  the maximum iops of 4k random
>> write I saw is 17k on single volume and 44k on the whole cluster.
>> I expected the number of 30 osds cluster could approximate 90k. But
>> unfornately,  I found that with 30 osds, it almost provides the performce
>> as 14 osds, even worse sometime. I checked the iostat output on all the
>> nodes, which have similar numbers. It's well distributed but disk
>> utilization is low.
>> In the test with 14 osds, I can see higher utilization of disk
>> (80%~90%).  So do you have any tunning suggestion to improve the performace
>> with 30 osds?
>> Any feedback is appreciated.
>>
>>
>> iostat output:
>> Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s
>> avgrq-sz avgqu-sz   await  svctm  %util
>> sda               0.00     0.00    0.00    0.00     0.00     0.00
>> 0.00     0.00    0.00   0.00   0.00
>> sdb               0.00    88.50    0.00 5188.00     0.00 93397.00
>>  18.00     0.90    0.17   0.09  47.85
>> sdc               0.00   443.50    0.00 5561.50     0.00 97324.00
>>  17.50     4.06    0.73   0.09  47.90
>> dm-0              0.00     0.00    0.00    0.00     0.00     0.00
>> 0.00     0.00    0.00   0.00   0.00
>> dm-1              0.00     0.00    0.00    0.00     0.00     0.00
>> 0.00     0.00    0.00   0.00   0.00
>>
>> Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s
>> avgrq-sz avgqu-sz   await  svctm  %util
>> sda               0.00    17.50    0.00   28.00     0.00  3948.00
>> 141.00     0.01    0.29   0.05   0.15
>> sdb               0.00    69.50    0.00 4932.00     0.00 87067.50
>>  17.65     2.27    0.46   0.09  43.45
>> sdc               0.00    69.00    0.00 4855.50     0.00 105771.50
>>  21.78     0.95    0.20   0.10  46.40
>> dm-0              0.00     0.00    0.00    0.00     0.00     0.00
>> 0.00     0.00    0.00   0.00   0.00
>> dm-1              0.00     0.00    0.00   42.50     0.00  3948.00
>>  92.89     0.01    0.19   0.04   0.15
>>
>> Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s
>> avgrq-sz avgqu-sz   await  svctm  %util
>> sda               0.00    12.00    0.00    8.00     0.00   568.00
>>  71.00     0.00    0.12   0.12   0.10
>> sdb               0.00    72.50    0.00 5046.50     0.00 113198.50
>>  22.43     1.09    0.22   0.10  51.40
>> sdc               0.00    72.50    0.00 4912.00     0.00 91204.50
>>  18.57     2.25    0.46   0.09  43.60
>> dm-0              0.00     0.00    0.00    0.00     0.00     0.00
>> 0.00     0.00    0.00   0.00   0.00
>> dm-1              0.00     0.00    0.00   18.00     0.00   568.00
>>  31.56     0.00    0.17   0.06   0.10
>>
>>
>>
>> Regards,
>> Mark Wu
>>
>>
>
> --
> Software Engineer #42 @ http://inktank.com | http://ceph.com
>

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Performance doesn't scale well on a full ssd cluster.

Reply via email to