Re: [ceph-users] Performance doesn't scale well on a full ssd cluster.

Gregory Farnum Thu, 16 Oct 2014 09:30:03 -0700

If you're running a single client to drive these tests, that's your
bottleneck. Try running multiple clients and aggregating their numbers.
-Greg


On Thursday, October 16, 2014, Mark Wu <[email protected]> wrote:

> Hi list,
>
> During my test, I found ceph doesn't scale as I expected on a 30 osds
> cluster.
> The following is the information of my setup:
> HW configuration:
>    15 Dell R720 servers, and each server has:
>       Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz, 20 cores and hyper-thread
> enabled.
>       128GB memory
>       two Intel 3500 SSD disks, connected with MegaRAID SAS 2208
> controller, each disk is configured as raid0 separately.
>       bonding with two 10GbE nics, used for both the public network and
> cluster network.
>
> SW configuration:
>    OS CentOS 6.5, Kernel 3.17,  Ceph 0.86
>    XFS as file system for data.
>    each SSD disk has two partitions, one is osd data and the other is osd
> journal.
>    the pool has 2048 pgs. 2 replicas.
>    5 monitors running on 5 of the 15 servers.
>    Ceph configuration (in memory debugging options are disabled)
>
> [osd]
> osd data = /var/lib/ceph/osd/$cluster-$id
> osd journal = /var/lib/ceph/osd/$cluster-$id/journal
> osd mkfs type = xfs
> osd mkfs options xfs = -f -i size=2048
> osd mount options xfs = rw,noatime,logbsize=256k,delaylog
> osd journal size = 20480
> osd mon heartbeat interval = 30 # Performance tuning filestore
> osd_max_backfills = 10
> osd_recovery_max_active = 15
> merge threshold = 40
> filestore split multiple = 8
> filestore fd cache size = 1024
> osd op threads = 64 # Recovery tuning osd recovery max active = 1 osd max
> backfills = 1
> osd recovery op priority = 1
> throttler perf counter = false
> osd enable op tracker = false
> filestore_queue_max_ops = 5000
> filestore_queue_committing_max_ops = 5000
> journal_max_write_entries = 1000
> journal_queue_max_ops = 5000
> objecter_inflight_ops = 8192
>
>
>   When I test with 7 servers (14 osds),  the maximum iops of 4k random
> write I saw is 17k on single volume and 44k on the whole cluster.
> I expected the number of 30 osds cluster could approximate 90k. But
> unfornately,  I found that with 30 osds, it almost provides the performce
> as 14 osds, even worse sometime. I checked the iostat output on all the
> nodes, which have similar numbers. It's well distributed but disk
> utilization is low.
> In the test with 14 osds, I can see higher utilization of disk (80%~90%).
> So do you have any tunning suggestion to improve the performace with 30
> osds?
> Any feedback is appreciated.
>
>
> iostat output:
> Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz
> avgqu-sz   await  svctm  %util
> sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00
>     0.00    0.00   0.00   0.00
> sdb               0.00    88.50    0.00 5188.00     0.00 93397.00    18.00
>     0.90    0.17   0.09  47.85
> sdc               0.00   443.50    0.00 5561.50     0.00 97324.00    17.50
>     4.06    0.73   0.09  47.90
> dm-0              0.00     0.00    0.00    0.00     0.00     0.00     0.00
>     0.00    0.00   0.00   0.00
> dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00
>     0.00    0.00   0.00   0.00
>
> Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz
> avgqu-sz   await  svctm  %util
> sda               0.00    17.50    0.00   28.00     0.00  3948.00   141.00
>     0.01    0.29   0.05   0.15
> sdb               0.00    69.50    0.00 4932.00     0.00 87067.50    17.65
>     2.27    0.46   0.09  43.45
> sdc               0.00    69.00    0.00 4855.50     0.00 105771.50
>  21.78     0.95    0.20   0.10  46.40
> dm-0              0.00     0.00    0.00    0.00     0.00     0.00     0.00
>     0.00    0.00   0.00   0.00
> dm-1              0.00     0.00    0.00   42.50     0.00  3948.00    92.89
>     0.01    0.19   0.04   0.15
>
> Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz
> avgqu-sz   await  svctm  %util
> sda               0.00    12.00    0.00    8.00     0.00   568.00    71.00
>     0.00    0.12   0.12   0.10
> sdb               0.00    72.50    0.00 5046.50     0.00 113198.50
>  22.43     1.09    0.22   0.10  51.40
> sdc               0.00    72.50    0.00 4912.00     0.00 91204.50    18.57
>     2.25    0.46   0.09  43.60
> dm-0              0.00     0.00    0.00    0.00     0.00     0.00     0.00
>     0.00    0.00   0.00   0.00
> dm-1              0.00     0.00    0.00   18.00     0.00   568.00    31.56
>     0.00    0.17   0.06   0.10
>
>
>
> Regards,
> Mark Wu
>
>

-- 
Software Engineer #42 @ http://inktank.com | http://ceph.com

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Performance doesn't scale well on a full ssd cluster.

Reply via email to