Hello Ceph-Community,
I’m writing here because we have a bad write-performance on our
Ceph-Cluster of about
As an overview the technical details of our Cluster:
3 x monitoring-Servers; each with 2 x 1 Gbit/s NIC configured as Bond
(Link Aggregation-Mode)
5 x datastore-Servers; each with 10 x 4 TB HDDs serving as OSDs, as
Journal we use a 15 GB LVM on an 256 GB SSD-Raid1; 2 x 10 Gbit/s NIC
configured as Bond (Link Aggregation->Mode)
ceph.conf
[global]
auth_service_required = cephx
filestore_xattr_use_omap = true
auth_client_required = cephx
auth_cluster_required = cephx
mon_host = 172.30.30.8,172.30.30.9
mon_initial_members = monitoring1, monitoring2, monitoring3
fsid = 5f22ab94-8d96-48c2-88d3-cff7bad443a9
public network = 172.30.30.0/24
[mon.monitoring1]
host = monitoring1
addr = 172.30.30.8:6789
[mon.monitoring2]
host = monitoring2
addr = 172.30.30.9:6789
[mon.monitoring3]
host = monitoring3
addr = 172.30.30.10:6789
[filestore]
filestore max sync interval = 10
[osd]
osd recovery max active = 1
osd journal size = 15360
osd op threads = 40
osd disk threads = 40
[osd.0]
host = datastore1
[osd.1]
host = datastore1
[osd.2]
host = datastore1
[osd.3]
host = datastore1
[osd.4]
host = datastore1
[osd.5]
host = datastore1
[osd.6]
host = datastore1
[osd.7]
host = datastore1
[osd.8]
host = datastore1
[osd.9]
host = datastore1
[osd.10]
host = datastore2
[osd.11]
host = datastore2
[osd.11]
host = datastore2
[osd.12]
host = datastore2
[osd.13]
host = datastore2
[osd.14]
host = datastore2
[osd.15]
host = datastore2
[osd.16]
host = datastore2
[osd.17]
host = datastore2
[osd.18]
host = datastore2
[osd.19]
host = datastore2
[osd.20]
host = datastore3
[osd.21]
host = datastore3
[osd.22]
host = datastore3
[osd.23]
host = datastore3
[osd.24]
host = datastore3
[osd.25]
host = datastore3
[osd.26]
host = datastore3
[osd.27]
host = datastore3
[osd.28]
host = datastore3
[osd.29]
host = datastore3
[osd.30]
host = datastore4
[osd.31]
host = datastore4
[osd.32]
host = datastore4
[osd.33]
host = datastore4
[osd.34]
host = datastore4
[osd.35]
host = datastore4
[osd.36]
host = datastore4
[osd.37]
host = datastore4
[osd.38]
host = datastore4
[osd.39]
host = datastore4
[osd.0]
host = datastore5
[osd.40]
host = datastore5
[osd.41]
host = datastore5
[osd.42]
host = datastore5
[osd.43]
host = datastore5
[osd.44]
host = datastore5
[osd.45]
host = datastore5
[osd.46]
host = datastore5
[osd.47]
host = datastore5
[osd.48]
host = datastore5
We have 3 pools:
-> 2 x 1000 pgs with 2 Replicas distributing the data equally to two
racks (Used for datastore 1-4)
-> 1 x 100 pgs without replication; data only stored on datastore 5.
This Pool is used to compare the performance on local disks without
networking
Here are the performance values, which I get using fio-Bench on a 32GB
rbd:
On 1000 pgs-Pool with distribution
fio --bs=1M --rw=randwrite --ioengine=libaio --direct=1 --iodepth=32
--runtime=60 --name=/dev/rbd/pool1/bench1
fio-2.0.13
Starting 1 process
Jobs: 1 (f=1): [w] [100.0% done] [0K/312.0M/0K /s] [0 /312 /0 iops]
[eta 00m:00s]
/dev/rbd/pool1/bench1: (groupid=0, jobs=1): err= 0: pid=21675: Fri Jul
4 11:03:52 2014
write: io=21071MB, bw=358989KB/s, iops=350 , runt= 60104msec
slat (usec): min=127 , max=8040 , avg=511.49, stdev=216.27
clat (msec): min=5 , max=4018 , avg=90.74, stdev=215.83
lat (msec): min=6 , max=4018 , avg=91.25, stdev=215.83
clat percentiles (msec):
| 1.00th=[ 8], 5.00th=[ 9], 10.00th=[ 11], 20.00th=[ 15],
| 30.00th=[ 21], 40.00th=[ 30], 50.00th=[ 45], 60.00th=[ 63],
| 70.00th=[ 83], 80.00th=[ 105], 90.00th=[ 129], 95.00th=[ 190],
| 99.00th=[ 1254], 99.50th=[ 1680], 99.90th=[ 2409], 99.95th=[ 2638],
| 99.99th=[ 3556]
bw (KB/s) : min=68210, max=479232, per=100.00%, avg=368399.55,
stdev=84457.12
lat (msec) : 10=9.50%, 20=20.02%, 50=23.56%, 100=24.56%, 250=18.09%
lat (msec) : 500=1.39%, 750=0.81%, 1000=0.65%, 2000=1.13%,
>=2000=0.29%
cpu : usr=11.17%, sys=7.46%, ctx=17772, majf=0, minf=24
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=99.9%,
>=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%,
>=64=0.0%
issued : total=r=0/w=21071/d=0, short=r=0/w=0/d=0
Run status group 0 (all jobs):
WRITE: io=21071MB, aggrb=358989KB/s, minb=358989KB/s, maxb=358989KB/s,
mint=60104msec, maxt=60104msec
On 100 pgs-Pool without distribution:
WRITE: io=5884.0MB, aggrb=297953KB/s, minb=297953KB/s, maxb=297953KB/s,
mint=20222msec, maxt=20222msec
Do you have any suggestion on how to improve the performace?
While Reading on the internet, typical write-rates should be around
800-1000 Mb/sec if using 10 Gbit/s-Connection with a similar setup.
Thanks in advance
--
Marco Allevato
Projektteam
Network Engineering GmbH
Maximilianstrasse 93
D-67346 Speyer