Hi,

I wouldn't put those SSD's in raid, just use them separately as journals for half of your's HDD's. This should make your write performance somewhat better.

W dniu 04.07.2014 o 11:13 Marco Allevato <[email protected]> pisze:


Hello Ceph-Community,


I’m writing here because we have a bad write-performance on our Ceph-Cluster of about

As an overview the technical details of our Cluster:


3 x monitoring-Servers; each with 2 x 1 Gbit/s NIC configured as Bond (Link Aggregation-Mode)


5 x datastore-Servers; each with 10 x 4 TB HDDs serving as OSDs, as Journal we use a 15 GB LVM on an 256 GB SSD-Raid1; 2 x 10 Gbit/s NIC configured as Bond (Link Aggregation->Mode)


ceph.conf


[global]

auth_service_required = cephx

filestore_xattr_use_omap = true

auth_client_required = cephx

auth_cluster_required = cephx

mon_host = 172.30.30.8,172.30.30.9

mon_initial_members = monitoring1, monitoring2, monitoring3

fsid = 5f22ab94-8d96-48c2-88d3-cff7bad443a9

public network = 172.30.30.0/24

[mon.monitoring1]

       host = monitoring1

       addr = 172.30.30.8:6789


[mon.monitoring2]

       host = monitoring2

       addr = 172.30.30.9:6789


[mon.monitoring3]

       host = monitoring3

       addr = 172.30.30.10:6789


[filestore]

      filestore max sync interval = 10


[osd]

       osd recovery max active = 1

       osd journal size = 15360

       osd op threads = 40

       osd disk threads = 40


[osd.0]

       host = datastore1


[osd.1]

       host = datastore1


[osd.2]

       host = datastore1


[osd.3]

       host = datastore1


[osd.4]

       host = datastore1


[osd.5]

       host = datastore1


[osd.6]

       host = datastore1


[osd.7]

       host = datastore1


[osd.8]

       host = datastore1


[osd.9]

       host = datastore1


[osd.10]

       host = datastore2


[osd.11]

       host = datastore2


[osd.11]

       host = datastore2


[osd.12]

       host = datastore2


[osd.13]

       host = datastore2


[osd.14]

       host = datastore2


[osd.15]

       host = datastore2


[osd.16]

       host = datastore2


[osd.17]

       host = datastore2


[osd.18]

       host = datastore2


[osd.19]

       host = datastore2


[osd.20]

       host = datastore3


[osd.21]

       host = datastore3


[osd.22]

       host = datastore3


[osd.23]

       host = datastore3


[osd.24]

       host = datastore3


[osd.25]

       host = datastore3


[osd.26]

       host = datastore3


[osd.27]

       host = datastore3


[osd.28]

       host = datastore3


[osd.29]

       host = datastore3


[osd.30]

       host = datastore4


[osd.31]

       host = datastore4


[osd.32]

       host = datastore4


[osd.33]

       host = datastore4


[osd.34]

       host = datastore4


[osd.35]

       host = datastore4


[osd.36]

       host = datastore4


[osd.37]

       host = datastore4


[osd.38]

       host = datastore4


[osd.39]

       host = datastore4


[osd.0]

       host = datastore5


[osd.40]

       host = datastore5


[osd.41]

       host = datastore5


[osd.42]

       host = datastore5


[osd.43]

       host = datastore5


[osd.44]

       host = datastore5


[osd.45]

       host = datastore5


[osd.46]

       host = datastore5


[osd.47]

       host = datastore5


[osd.48]

       host = datastore5



We have 3 pools:

-> 2 x 1000 pgs with 2 Replicas distributing the data equally to two racks (Used for datastore 1-4)

-> 1 x 100 pgs without replication; data only stored on datastore 5. This Pool is used to compare the performance on local disks without networking



Here are the performance values, which I get using fio-Bench on a 32GB rbd:



On 1000 pgs-Pool with distribution


fio --bs=1M --rw=randwrite --ioengine=libaio --direct=1 --iodepth=32 --runtime=60 --name=/dev/rbd/pool1/bench1


fio-2.0.13

Starting 1 process

Jobs: 1 (f=1): [w] [100.0% done] [0K/312.0M/0K /s] [0 /312 /0 iops] [eta 00m:00s]

/dev/rbd/pool1/bench1: (groupid=0, jobs=1): err= 0: pid=21675: Fri Jul 4 11:03:52 2014

 write: io=21071MB, bw=358989KB/s, iops=350 , runt= 60104msec

   slat (usec): min=127 , max=8040 , avg=511.49, stdev=216.27

   clat (msec): min=5 , max=4018 , avg=90.74, stdev=215.83

    lat (msec): min=6 , max=4018 , avg=91.25, stdev=215.83

   clat percentiles (msec):

    |  1.00th=[    8],  5.00th=[    9], 10.00th=[   11], 20.00th=[   15],

    | 30.00th=[   21], 40.00th=[   30], 50.00th=[   45], 60.00th=[   63],

    | 70.00th=[   83], 80.00th=[  105], 90.00th=[  129], 95.00th=[  190],

    | 99.00th=[ 1254], 99.50th=[ 1680], 99.90th=[ 2409], 99.95th=[ 2638],

    | 99.99th=[ 3556]

bw (KB/s) : min=68210, max=479232, per=100.00%, avg=368399.55, stdev=84457.12

   lat (msec) : 10=9.50%, 20=20.02%, 50=23.56%, 100=24.56%, 250=18.09%

lat (msec) : 500=1.39%, 750=0.81%, 1000=0.65%, 2000=1.13%, >=2000=0.29%

 cpu          : usr=11.17%, sys=7.46%, ctx=17772, majf=0, minf=24

IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=99.9%, >=64=0.0%

submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%

complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%

    issued    : total=r=0/w=21071/d=0, short=r=0/w=0/d=0


Run status group 0 (all jobs):

WRITE: io=21071MB, aggrb=358989KB/s, minb=358989KB/s, maxb=358989KB/s, mint=60104msec, maxt=60104msec



On 100 pgs-Pool without distribution:


WRITE: io=5884.0MB, aggrb=297953KB/s, minb=297953KB/s, maxb=297953KB/s, mint=20222msec, maxt=20222msec



Do you have any suggestion on how to improve the performace?

While Reading on the internet, typical write-rates should be around 800-1000 Mb/sec if using 10 Gbit/s-Connection with a similar setup.



Thanks in advance


--

Marco Allevato
Projektteam


Network Engineering GmbH
Maximilianstrasse 93
D-67346 Speyer






--

Konrad Gutkowski
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to