forgot to cc the list ---------- 转发的邮件 ---------- 发件人:"Mark Wu" <[email protected]> 日期:2014 年 10 月 17 日 上午 12:51 主题:Re: [ceph-users] Performance doesn't scale well on a full ssd cluster. 收件人:"Gregory Farnum" <[email protected]> 抄送:
Thanks for the reply. I am not using single client. Writing 5 rbd volumes on 3 host can reach the peak. The client is fio and also running on osd nodes. But there're no bottlenecks on cpu or network. I also tried running client on two non osd servers, but the same result. 2014 年 10 月 17 日 上午 12:29于 "Gregory Farnum" <[email protected]>写道: > If you're running a single client to drive these tests, that's your > bottleneck. Try running multiple clients and aggregating their numbers. > -Greg > > On Thursday, October 16, 2014, Mark Wu <[email protected]> wrote: > >> Hi list, >> >> During my test, I found ceph doesn't scale as I expected on a 30 osds >> cluster. >> The following is the information of my setup: >> HW configuration: >> 15 Dell R720 servers, and each server has: >> Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz, 20 cores and >> hyper-thread enabled. >> 128GB memory >> two Intel 3500 SSD disks, connected with MegaRAID SAS 2208 >> controller, each disk is configured as raid0 separately. >> bonding with two 10GbE nics, used for both the public network and >> cluster network. >> >> SW configuration: >> OS CentOS 6.5, Kernel 3.17, Ceph 0.86 >> XFS as file system for data. >> each SSD disk has two partitions, one is osd data and the other is osd >> journal. >> the pool has 2048 pgs. 2 replicas. >> 5 monitors running on 5 of the 15 servers. >> Ceph configuration (in memory debugging options are disabled) >> >> [osd] >> osd data = /var/lib/ceph/osd/$cluster-$id >> osd journal = /var/lib/ceph/osd/$cluster-$id/journal >> osd mkfs type = xfs >> osd mkfs options xfs = -f -i size=2048 >> osd mount options xfs = rw,noatime,logbsize=256k,delaylog >> osd journal size = 20480 >> osd mon heartbeat interval = 30 # Performance tuning filestore >> osd_max_backfills = 10 >> osd_recovery_max_active = 15 >> merge threshold = 40 >> filestore split multiple = 8 >> filestore fd cache size = 1024 >> osd op threads = 64 # Recovery tuning osd recovery max active = 1 osd max >> backfills = 1 >> osd recovery op priority = 1 >> throttler perf counter = false >> osd enable op tracker = false >> filestore_queue_max_ops = 5000 >> filestore_queue_committing_max_ops = 5000 >> journal_max_write_entries = 1000 >> journal_queue_max_ops = 5000 >> objecter_inflight_ops = 8192 >> >> >> When I test with 7 servers (14 osds), the maximum iops of 4k random >> write I saw is 17k on single volume and 44k on the whole cluster. >> I expected the number of 30 osds cluster could approximate 90k. But >> unfornately, I found that with 30 osds, it almost provides the performce >> as 14 osds, even worse sometime. I checked the iostat output on all the >> nodes, which have similar numbers. It's well distributed but disk >> utilization is low. >> In the test with 14 osds, I can see higher utilization of disk >> (80%~90%). So do you have any tunning suggestion to improve the performace >> with 30 osds? >> Any feedback is appreciated. >> >> >> iostat output: >> Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s >> avgrq-sz avgqu-sz await svctm %util >> sda 0.00 0.00 0.00 0.00 0.00 0.00 >> 0.00 0.00 0.00 0.00 0.00 >> sdb 0.00 88.50 0.00 5188.00 0.00 93397.00 >> 18.00 0.90 0.17 0.09 47.85 >> sdc 0.00 443.50 0.00 5561.50 0.00 97324.00 >> 17.50 4.06 0.73 0.09 47.90 >> dm-0 0.00 0.00 0.00 0.00 0.00 0.00 >> 0.00 0.00 0.00 0.00 0.00 >> dm-1 0.00 0.00 0.00 0.00 0.00 0.00 >> 0.00 0.00 0.00 0.00 0.00 >> >> Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s >> avgrq-sz avgqu-sz await svctm %util >> sda 0.00 17.50 0.00 28.00 0.00 3948.00 >> 141.00 0.01 0.29 0.05 0.15 >> sdb 0.00 69.50 0.00 4932.00 0.00 87067.50 >> 17.65 2.27 0.46 0.09 43.45 >> sdc 0.00 69.00 0.00 4855.50 0.00 105771.50 >> 21.78 0.95 0.20 0.10 46.40 >> dm-0 0.00 0.00 0.00 0.00 0.00 0.00 >> 0.00 0.00 0.00 0.00 0.00 >> dm-1 0.00 0.00 0.00 42.50 0.00 3948.00 >> 92.89 0.01 0.19 0.04 0.15 >> >> Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s >> avgrq-sz avgqu-sz await svctm %util >> sda 0.00 12.00 0.00 8.00 0.00 568.00 >> 71.00 0.00 0.12 0.12 0.10 >> sdb 0.00 72.50 0.00 5046.50 0.00 113198.50 >> 22.43 1.09 0.22 0.10 51.40 >> sdc 0.00 72.50 0.00 4912.00 0.00 91204.50 >> 18.57 2.25 0.46 0.09 43.60 >> dm-0 0.00 0.00 0.00 0.00 0.00 0.00 >> 0.00 0.00 0.00 0.00 0.00 >> dm-1 0.00 0.00 0.00 18.00 0.00 568.00 >> 31.56 0.00 0.17 0.06 0.10 >> >> >> >> Regards, >> Mark Wu >> >> > > -- > Software Engineer #42 @ http://inktank.com | http://ceph.com >
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
