Ok thanks for sharing. yes my journals are Intel S3610 200GB, which I
partition in 4 partitions each ~45GB. When I ceph-deploy I declare
these as the journals of the OSDs.
I was trying to understand the blocking, and how much my SAS OSDs
affected my performance. I have a total of 9 hosts, 158 OSDs each
1.8TB. The Servers are connected through copper 10Gbit LACP bonds.
My failure domain is by type RACK. The CRUSH rule set is by rack. 3
hosts in each rack. Pool size is =3. I'm running hammer on centos7.
I did a simple fio test from one of my xl instances, and got the
results below. The Latency 7.21ms is worrying, is this expected
results? Or is there any way I can further tune my cluster to achieve
better results? thx will
FIO: sync=1, direct=1, bs=4k
write-50: (groupid=11, jobs=50): err= 0: pid=3945: Sun Oct 16 08:41:15 2016
write: io=832092KB, bw=27721KB/s, iops=6930, runt= 30017msec
clat (msec): min=2, max=253, avg= 7.21, stdev= 4.97
lat (msec): min=2, max=253, avg= 7.21, stdev= 4.97
clat percentiles (msec):
| 1.00th=[ 4], 5.00th=[ 4], 10.00th=[ 5], 20.00th=[ 5],
| 30.00th=[ 5], 40.00th=[ 6], 50.00th=[ 7], 60.00th=[ 8],
| 70.00th=[ 9], 80.00th=[ 10], 90.00th=[ 12], 95.00th=[ 14],
| 99.00th=[ 17], 99.50th=[ 19], 99.90th=[ 21], 99.95th=[ 23],
| 99.99th=[ 253]
bw (KB /s): min= 341, max= 870, per=2.01%, avg=556.60, stdev=136.98
lat (msec) : 4=8.24%, 10=74.10%, 20=17.52%, 50=0.12%, 500=0.02%
cpu : usr=0.04%, sys=0.23%, ctx=425242, majf=0, minf=1570
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=0/w=208023/d=0, short=r=0/w=0/d=0
On Sun, Oct 16, 2016 at 4:18 PM, Christian Balzer <ch...@gol.com> wrote:
> On Sun, 16 Oct 2016 15:03:24 +0800 William Josefsson wrote:
>> Hi list, while I know that writes in the RADOS backend are sync() can
>> anyone please explain when the cluster will return on a write call for
>> RBD from VMs? Will data be considered synced one written to the
>> journal or all the way to the OSD drive?
> This has been answered countless (really) here, the Ceph Architecture
> documentation should really be more detailed about this, as well as how
> parallel the data is being sent to the secondary OSDs.
> It is of course ack'ed to the client once all journals have successfully
> written the data, otherwise journal SSDs would make a LOT less sense.
>> Each host in my cluster has 5x Intel S3610, and 18x1.8TB Hitachi 10krpm SAS.
> The size of your SSDs (you didn't mention) will determine the speed, for
> journal purposes the sequential write speed is basically it.
> A 5:18 ratio implies that some of your SSDs hold more journals than others.
> You emphatically do NOT want that, because eventually the busier ones will
> run out of endurance while the other ones still have plenty left.
> If possible change this to a 5:20 or 6:18 ratio (depending on your SSDs
> and expected write volume).
>> I have size=3 for my pool. Will Ceph return once the data is written
>> to at least 3 designated journals, or will it in fact wait until the
>> data is written to the OSD drives? thx will
>> ceph-users mailing list
> Christian Balzer Network/Systems Engineer
> ch...@gol.com Global OnLine Japan/Rakuten Communications
ceph-users mailing list