DD tests: results below for slow VM host. Max throughput seems to cap at 130M 
spikes.
sysbench: results below for slow VM host.
Restarting iscsi services: I have performed the tests after restarting iscsi 
and even after restarting the VM, with no change in results. - note, however, 
we have no iscsi targets.

You wrote "I'm looking for ... at least 600-700MB/s of throughput per thread."
Are you saying that if properly configured/architected, the dd tests on our VMs 
should yield this amount of throughput, and not the 130M I am seeing?

If the VMs dd-test speed and dstat reported throughput is the reference, then:
The write speed for our slowest OSD, using the same dd test, yields 2x MB/s 
dd-test speed and 2x dstat reported throughput (100M).
The write speed for our fastest OSD, using the same dd test, yields 3x MB/s 
dd-test speed and 3x dstat reported throughput (150M).
(See results below for slowest OSD)

We have 16 OSDs across 4 storage hosts, should that mean my max best throughput 
would be 100Mx16, or 100Mx4?

-RG


--- results of dd test with dstat report for slowest OSD ---

[root@ceph1host ceph-15]# dd if=/dev/zero of=./disk-test bs=65536k count=26 
oflag=direct
26+0 records in
26+0 records out
1744830464 bytes (1.7 GB) copied, 17.8665 s, 97.7 MB/s

[root@ceph1host ~]# dstat --all
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw 
  3   1  96   0   0   0|2162k  861k|   0     0 |1036B 1058B|2186  2300 
  0   0 100   0   0   0|   0   376k|  22k   22k|   0     0 | 829   914 
  4   0  95   1   0   0|2336k 1922k| 409k   67k|   0     0 |2690  3293 
 10   1  88   1   0   0|6872k 9604k|  55k   30k|   0     0 |1840  1007 
  0   4  93   4   0   0|  52k   83M|  51k   33k|   0     0 |1615  1034 
  0   1  73  27   0   0|   0    46M|  50k   39k|   0     0 |1326  1118 
  0   1  23  76   0   0|  12k   70M| 241k  103k|   0     0 |2035  1909 
  1   1  53  46   0   0|  84k  114M| 227k   59k|   0     0 |1953  1828 
  0   1  72  27   0   0|8192B   78M|  18k   27k|   0     0 |1135   759 
  0   1  50  49   0   0|   0    59M|  14k   12k|   0     0 | 908   540 
  0   1  49  50   0   0|8192B  124M|  96k   55k|   0     0 |2747  2171 
  0   1  36  63   0   0|4096B   52M|  54k   41k|   0     0 |1312  1066 
  1   2  27  71   0   0|1044k  146M|1152k   49k|   0     0 |2081  1558 
  0   1  50  49   0   0|8192B   86M|  68k   28k|   0     0 |1490  1070


--- results of dd test with dstat report for slow VM host---

[root@slow1host disk-test]# dd if=/dev/zero of=./$RANDOM bs=4k count=220000 
oflag=direct
220000+0 records in
220000+0 records out
901120000 bytes (901 MB) copied, 527.177 seconds, 1.7 MB/s
[root@slow1host disk-test]# dd if=/dev/zero of=./$RANDOM bs=8k count=140000 
oflag=direct
140000+0 records in
140000+0 records out
1146880000 bytes (1.1 GB) copied, 351.438 seconds, 3.3 MB/s
[root@slow1host disk-test]# dd if=/dev/zero of=./$RANDOM bs=16k count=90000 
oflag=direct
90000+0 records in
90000+0 records out
1474560000 bytes (1.5 GB) copied, 233.742 seconds, 6.3 MB/s
[root@slow1host disk-test]# dd if=/dev/zero of=./$RANDOM bs=32k count=40000 
oflag=direct
40000+0 records in
40000+0 records out
1310720000 bytes (1.3 GB) copied, 111.963 seconds, 11.7 MB/s
[root@slow1host disk-test]# dd if=/dev/zero of=./$RANDOM bs=32768k count=90 
oflag=direct
90+0 records in
90+0 records out
3019898880 bytes (3.0 GB) copied, 50.1434 seconds, 60.2 MB/s
[root@slow1host disk-test]# dd if=/dev/zero of=./$RANDOM bs=65536k count=26 
oflag=direct
26+0 records in
26+0 records out
1744830464 bytes (1.7 GB) copied, 34.3239 seconds, 50.8 MB/s

1. dd if=/dev/zero of=./$RANDOM bs=4k count=220000 oflag=direct 
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw 
  0   1   0  98   1   0|   0  6592k| 486B  322B|   0     0 |1421   852 
  0   2   0  98   0   0|   0  6288k| 366B  818B|   0     0 |1401   806 
  1   1   0  97   1   0|   0  6112k| 486B  322B|   0     0 |1392   783 
  0   2   0  97   1   0|   0  6368k| 546B  322B|   0     0 |1408   820 
  0   1   0  99   0   0|   0  6528k| 186B  322B|   0     0 |1412   834 
  0   1   0  97   1   1|   0  5600k| 606B  322B|   0     0 |1362   726 
  0   1   0  98   1   0|   0  6720k| 246B  322B|   0     0 |1427   863 
  0   2   0  98   0   0|   0  7168k| 246B  322B|   0     0 |1453   918 
  0   2   0  97   1   0|   0  6496k| 546B  322B|   0     0 |1419   831 
  1   2   0  97   0   0|   0  6240k| 486B  322B|   0     0 |1400   800 

2. dd if=/dev/zero of=./$RANDOM bs=8k count=140000 oflag=direct 
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw 
  0   0  99   1   0   0| 380B   13k|   0     0 |   0     0 |1007    22 
  0   2   0  98   0   0|   0  6000k| 384B  306B|   0     0 |1384   778 
  0   1   0  98   1   0|   0  6488k| 426B  322B|   0     0 |1391   790 
  0   2   0  98   0   0|   0  8304k| 486B  322B|   0     0 |1530  1060 
  0   0   0  99   1   0|   0  6240k| 306B  322B|   0     0 |1398   802 
  0   1   0  99   0   0|   0  5520k| 666B  322B|   0     0 |1359   712 
  0   2   0  97   1   0|   0  5792k| 432B  420B|   0     0 |1372   748 
  0   2   0  97   1   0|   0  6752k| 486B  322B|   0     0 |1431   866 
  0   3   0  97   0   0|   0  8008k| 486B  420B|   0     0 |1511  1090 
  1   3   0  95   1   0|   0  8400k| 426B  322B|   0     0 |1535  1075 

3. dd if=/dev/zero of=./$RANDOM bs=16k count=90000 oflag=direct
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw 
  0   0  99   1   0   0| 380B   14k|   0     0 |   0     0 |1007    22 
  1   2   0  97   0   0|   0    13M| 924B  306B|   0     0 |1418   863 
  0   1   0  98   1   0|   0    13M| 366B  322B|   0     0 |1420   849 
  0   1   0  99   0   0|   0    12M| 546B  322B|   0     0 |1387   773 
  0   2   0  97   1   0|   0    12M| 366B  322B|   0     0 |1402   814 
  0   2   0  98   0   0|   0    11M| 486B  322B|   0     0 |1377   753 
  0   1   0  99   0   0|   0    11M| 546B  322B|   0     0 |1361   731 
  0   2   0  98   0   0|   0    13M| 306B  322B|   0     0 |1411   834 
  0   1   0  98   1   0|   0    14M| 306B  322B|   0     0 |1442   891 
  0   3   0  97   0   0|   0    12M| 546B  322B|   0     0 |1406   815 

4. dd if=/dev/zero of=./$RANDOM bs=32k count=40000 oflag=direct 
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw 
  0   0  99   1   0   0| 380B   19k|   0     0 |   0     0 |1007    22 
  0   1   0  98   1   0|   0    89M| 558B  642B|   0     0 |1153    47 
  0   1   0  99   0   0|   0    73M| 666B  322B|   0     0 |1130    38 
  0   1   0  98   1   0|   0    65M| 366B  322B|   0     0 |1109    48 
  0   0   0 100   0   0|   0    50M| 786B  322B|   0     0 |1105    30 
  0   0   0 100   0   0|   0    44M| 306B  322B|   0     0 |1074    26 
  0   2   0  97   1   0|   0    88M| 606B  322B|   0     0 |1150    40 
  1   1   0  98   0   0|   0    88M| 432B  420B|   0     0 |1154    42 
  0   1   0  99   0   0|   0    69M| 306B  322B|   0     0 |1118    42 
  0   1   0  98   1   0|   0    83M| 666B  322B|   0     0 |1145    40 

5. dd if=/dev/zero of=./$RANDOM bs=32768k count=90 oflag=direct
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw 
  0   0  99   1   0   0| 379B   25k|   0     0 |   0     0 |1007    22 
  0   2   0  98   0   0|   0   114M| 774B  306B|   0     0 |1203    47 
  0   1   0  98   1   0|   0   129M| 666B  322B|   0     0 |1226    43 
  0   1   0  99   0   0|   0   101M| 786B  322B|   0     0 |1179    43 
  0   2   0  97   1   0|   0   112M| 966B  322B|   0     0 |1199    54 
  0   1   0  98   1   0|   0   129M| 516B  322B|   0     0 |1225    44 
  1   2   0  97   0   0|   0   109M| 426B  322B|   0     0 |1188    48 
  0   1   0  98   1   0|   0   121M| 486B  322B|   0     0 |1203    44 
  0   1   0  99   0   0|   0   108M| 575B  322B|   0     0 |1185    40 
  0   2   0  97   1   0|   0   116M| 606B  322B|   0     0 |1183    44 

6. dd if=/dev/zero of=./$RANDOM bs=65536k count=26 oflag=direct
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw 
  0   0  99   1   0   0| 379B   28k|   0     0 |   0     0 |1007    22 
  0   1   0  99   0   0|   0   105M| 624B  306B|   0     0 |1181    57 
  0   1   0  99   0   0|   0   127M| 636B  412B|   0     0 |1201    50 
  0   2   0  97   1   0|   0   127M| 426B  322B|   0     0 |1218    63 
  0   1   0  99   0   0|   0   121M| 426B  322B|   0     0 |1215    48 
  0   2   0  97   1   0|   0   113M| 426B  322B|   0     0 |1175    50 
  0   1   0  99   0   0|   0   117M| 632B  322B|   0     0 |1196    81 
  1   1   0  96   1   1|   0   130M| 546B  322B|   0     0 |1217    65 
  0   2   0  97   1   0|   0   125M| 306B  322B|   0     0 |1205    56 
  0   1   0  99   0   0|   0   114M| 486B  322B|   0     0 |1193    46 


--- results of sysbench with dstat output for slow VM host ---

[root@slow1host disk-test]# sysbench --num-threads=16 --test=fileio 
--file-total-size=3G --file-test-mode=rndrw run
sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 16

Extra file open flags: 0
128 files, 24Mb each
3Gb total file size
Block size 16Kb
Number of random requests for random IO: 10000
Read/Write ratio for combined random IO test: 1.50
Periodic FSYNC enabled, calling fsync() each 100 requests.
Calling fsync() at the end of test, Enabled.
Using synchronous I/O mode
Doing random r/w test
Threads started!
Done.

Operations performed:  6000 Read, 4000 Write, 12672 Other = 22672 Total
Read 93.75Mb  Written 62.5Mb  Total transferred 156.25Mb  (31.248Mb/sec)
 1999.85 Requests/sec executed

Test execution summary:
    total time:                          5.0004s
    total number of events:              10000
    total time taken by event execution: 1.3632
    per-request statistics:
         min:                                  0.00ms
         avg:                                  0.14ms
         max:                                165.03ms
         approx.  95 percentile:               0.03ms

Threads fairness:
    events (avg/stddev):           625.0000/239.85
    execution time (avg/stddev):   0.0852/0.05

[root@slow1host disk-test]# dstat --all
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw 
  0   0  99   1   0   0| 388B   40k|   0     0 |   0     0 |1007    22 
  0   0 100   0   0   0|   0     0 | 384B  306B|   0     0 |1008    26 
  0   0 100   0   0   0|   0     0 | 486B  322B|   0     0 |1012    25 
  1   6  34  58   1   0|   0    29M| 906B 1754B|   0     0 |1185  1159 
  1  10   0  88   1   0|   0    39M| 642B  420B|   0     0 |1231  1468 
  0   6   0  92   2   0|   0    33M| 606B  322B|   0     0 |1198  1243 
  1   2   0  97   0   0|   0    14M| 276B  322B|   0     0 |1094   575 
  0   4   0  95   1   0|   0    17M| 486B  322B|   0     0 |1117   681 
  1   3  65  31   0   0|   0  2144k| 876B 2348B|   0     0 |1027   142 
  0   0 100   0   0   0|   0     0 | 366B  322B|   0     0 |1008    23 
  0   0 100   0   0   0|   0     0 | 696B  322B|   0     0 |1013    29 



----- Original Message -----
From: "German Anders" <[email protected]>
To: "Russell E. Glaue" <[email protected]>
Cc: [email protected]
Sent: Wednesday, April 2, 2014 7:35:36 PM
Subject: Re: [ceph-users] write speed issue on RBD image


So the real 'fast' performance was 100MB/s? Or you got some improve numbers? 
I'm looking for a cluster that could provide me at least 600-700MB/s of 
throughput per thread. Could you try this DDs and see what are the results?: 




dd if=/dev/zero of=./$RANDOM bs=4k count=220000 oflag=direct 
dd if=/dev/zero of=./$RANDOM bs=8k count=140000 oflag=direct 
dd if=/dev/zero of=./$RANDOM bs=16k count=90000 oflag=direct 
dd if=/dev/zero of=./$RANDOM bs=32k count=40000 oflag=direct 
dd if=/dev/zero of=./$RANDOM bs=32768k count=900 oflag=direct 
dd if=/dev/zero of=./$RANDOM bs=65536k count=260 oflag=direct 




Also you could try this other tools for measure performance: 


sysbench --num-threads=16 --test=fileio --file-total-size=3G 
--file-test-mode=rndrw prepare 
sysbench --num-threads=16 --test=fileio --file-total-size=3G 
--file-test-mode=rndrw run 

sysbench --num-threads=16 --test=fileio --file-total-size=3G 
--file-test-mode=rndrw cleanup 


It's rare that in order to solve that perf issue you need to reinstall the 
server, did you try to restart the iscsi services? and also to run for example 
this commands: 


$ iscsiadm -m session 
$ iscsiadm -m node -T <IQN> -p <IP> -u 
$ iscsiadm -m discoverydb -t st -p <IP> -o delete 


And then make the connections again and see if that works, Also you could run 
an iotop, with --only option to see only the processes that are really doing 
something with the disks: 


iotop --only 


Also try to run "top" to see info like tasks, memory, cpu and swap, and look 
there for something that it's not normal 


Hope this helps, 


Best regards, 






German Anders 
Field Storage Support Engineer 

Despegar.com - IT Team 










--- Original message --- 
Asunto: Re: [ceph-users] write speed issue on RBD image 
De: Russell E. Glaue <[email protected]> 
Para: German Anders <[email protected]> 
Fecha: Wednesday, 02/04/2014 19:20 


the Switches that you are using: Dell PowerConnect 8132, 10GBaseT version; 
We're using 2x10GbE LAGs for each host. 
firmware of the HBA on the hosts: PERC 6/i RAID Card, latest firmware 
are they Blades or "traditional" servers?: Traditional, DELL PER710 - drives 
are 2TB Segate 
did you use any special options when formatting the XFS filesystem? and/or 
mount options?: No 
What hypervisor are you using?: KVM/libvirt/QEMU on CentOS 6.5 

The dd test yields 1.6GB/s on the hard drives mounted Xfs volume managed by the 
OSDs. 


Using the "oflag=direct" tests on both the slow-write and fast-write VMs report 
about ~ 50MB/s. 
putting the dd in the background, and running dstat in the foreground report 
about the same results on all tested hosts (see output below). 


Now, without the "oflag=direct", and running dd in the background with dstat in 
the foreground shows a different story (see output below). 

For the fast-disk-write VMs, the data is written out in the first 7 iterations 
of the 'dstat --all' output, and the writing is in larger throughput, greater 
than 100MB/s. 

But for slow-disk-write VMs, however, 16 iterations of 'dstat --all' pass 
before significant writes are performed, and the writing is in slower 
throughput, less than 50MB/s. 


So this was a good test. For some reason, the VM OS is not writing right away. 
Any suggestions on how to address this? Rather than reinstalling the OS - 
because I'd like to know how to prevent this from occurring again. 


results follow: 

--- fast-disk-write VMs without oflag=direct --- 
[root@fast1host tmp]# /bin/rm disk-test; dd if=/dev/zero of=disk-test 
bs=1048576 count=512 & 
[1] 4184 
[root@fast1host tmp]# dstat --all 
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system-- 
usr sys idl wai hiq siq| read writ| recv send| in out | int csw 
1 1 96 2 0 0| 79k 730k| 0 0 | 0 0 |1013 56 512+0 records in 
512+0 records out 
536870912 bytes (537 MB) copied, 0.362167 seconds, 1.5 GB/s 

1 14 0 82 3 0| 0 117M|1062B 728B| 0 0 |1118 78 
0 5 0 92 2 1| 0 128M| 366B 322B| 0 0 |1124 66 
0 6 0 91 3 0| 0 145M| 426B 322B| 0 0 |1138 58 
0 5 0 93 2 0| 0 128M| 486B 322B| 0 0 |1148 54 
0 2 54 43 1 0| 0 70M| 426B 322B| 0 0 |1082 36 
0 1 98 1 0 0| 0 440k| 606B 322B| 0 0 |1014 30 
0 0 100 0 0 0| 0 0 | 246B 322B| 0 0 |1007 40 
0 0 100 0 0 0| 0 0 | 426B 322B| 0 0 |1007 20 
0 0 100 0 0 0| 0 0 | 306B 322B| 0 0 |1006 22 
[1]+ Done dd if=/dev/zero of=disk-test bs=1048576 count=512 


--- slow-disk-write VMs without oflag=direct --- 
[root@slow1host tmp]# /bin/rm disk-test; dd if=/dev/zero of=disk-test 
bs=1048576 count=512 & 
[1] 25192 
[root@slow1host tmp]# dstat --all 
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system-- 
usr sys idl wai hiq siq| read writ| recv send| in out | int csw 
0 0 99 1 0 0| 384B 9.9k| 0 0 | 0 0 |1007 22 
0 100 0 0 0 0| 0 0 | 678B 642B| 0 0 |1014 22 
0 100 0 0 0 0| 0 0 | 408B 322B| 0 0 |1010 27 
0 100 0 0 0 0| 0 0 | 546B 322B| 0 0 |1012 27 
0 99 0 1 0 0| 0 440k| 486B 322B| 0 0 |1014 35 
1 99 0 0 0 0| 0 0 | 246B 322B| 0 0 |1006 21 
0 100 0 0 0 0| 0 0 | 552B 420B| 0 0 |1013 25 
0 100 0 0 0 0| 0 0 | 426B 322B| 0 0 |1010 23 
0 100 0 0 0 0| 0 0 | 546B 322B| 0 0 |1013 27 
0 100 0 0 0 0| 0 48k| 306B 322B| 0 0 |1012 60 
0 100 0 0 0 0| 0 0 | 576B 322B| 0 0 |1012 23 
0 100 0 0 0 0| 0 0 | 366B 322B| 0 0 |1005 23 
0 100 0 0 0 0| 0 0 | 336B 322B| 0 0 |1008 25 
0 100 0 0 0 0| 0 0 | 606B 322B| 0 0 |1012 25 
0 99 0 1 0 0| 0 24k| 336B 322B| 0 0 |1009 31 
0 100 0 0 0 0| 0 0 | 366B 322B| 0 0 |1009 21 
0 100 0 0 0 0| 0 35M| 486B 322B| 0 0 |1055 41 
1 97 0 0 2 0| 0 44M| 306B 322B| 0 0 |1068 43 
0 100 0 0 0 0| 0 34M| 426B 322B| 0 0 |1052 45 
0 100 0 0 0 0| 0 32M| 606B 322B| 0 0 |1059 48 
0 99 0 0 1 0| 0 40M| 426B 322B| 0 0 |1068 39 
0 99 0 0 1 0| 0 40M| 732B 420B| 0 0 |1078 43 
0 99 0 0 1 0| 0 40M| 306B 322B| 0 0 |1074 43 
0 100 0 0 0 0| 0 32M| 426B 322B| 0 0 |1069 47 
0 98 0 1 1 0| 0 40M| 606B 322B| 0 0 |1086 54 
0 99 0 0 1 0| 0 32M| 426B 322B| 0 0 |1070 37 512+0 records in 
512+0 records out 
536870912 bytes (537 MB) copied, 26.9738 seconds, 19.9 MB/s 

0 49 51 0 0 0| 0 24M| 624B 744B| 0 0 |1058 45 
0 0 100 0 0 0| 0 0 | 516B 322B| 0 0 |1011 23 
0 0 100 0 0 0| 0 0 | 426B 322B| 0 0 |1010 29 
1 0 99 0 0 0| 0 0 | 696B 322B| 0 0 |1013 23 
0 1 98 1 0 0| 0 48k| 306B 322B| 0 0 |1010 33 
[1]+ Done dd if=/dev/zero of=disk-test bs=1048576 count=512 


--- results on fast-disk-write VMs with oflag=direct --- 
[root@fast1host tmp]# /bin/rm disk-test; dd if=/dev/zero of=disk-test 
bs=1048576 count=512 oflag=direct & 
[1] 4191 
[root@fast1host tmp]# dstat --all 
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system-- 
usr sys idl wai hiq siq| read writ| recv send| in out | int csw 
1 1 96 2 0 0| 78k 844k| 0 0 | 0 0 |1013 55 
0 1 0 99 0 0|8192B 105M| 684B 306B| 0 0 |1206 142 
0 2 0 97 1 0| 0 122M| 546B 322B| 0 0 |1246 140 
0 2 0 98 0 0| 0 119M| 606B 322B| 0 0 |1238 144 
0 1 0 98 1 0| 0 117M| 366B 322B| 0 0 |1232 140 
0 1 0 99 0 0| 0 88M| 426B 322B| 0 0 |1179 118 
0 1 0 99 0 0| 0 119M| 486B 322B| 0 0 |1236 140 
0 2 0 97 1 0| 0 106M| 366B 322B| 0 0 |1210 128 
0 1 0 99 0 0| 0 99M| 666B 322B| 0 0 |1210 120 
0 1 0 99 0 0| 0 112M| 246B 322B| 0 0 |1235 136 512+0 records in 
512+0 records out 
536870912 bytes (537 MB) copied, 9.29427 seconds, 57.8 MB/s 

1 0 94 5 0 0| 0 6144k| 564B 744B| 0 0 |1024 31 
0 0 99 1 0 0| 0 80k| 606B 322B| 0 0 |1015 29 
0 0 100 0 0 0| 0 0 | 246B 322B| 0 0 |1007 20 
0 0 100 0 0 0| 0 0 | 426B 322B| 0 0 |1010 24 
0 0 100 0 0 0| 0 0 | 486B 322B| 0 0 |1011 24 
0 0 100 0 0 0| 0 0 | 486B 322B| 0 0 |1010 22 
[1]+ Done dd if=/dev/zero of=disk-test bs=1048576 count=512 oflag=direct 

--- results on slow-disk-write VMs with oflag=direct --- 
[root@slow1host tmp]# /bin/rm disk-test; dd if=/dev/zero of=disk-test 
bs=1048576 count=512 oflag=direct & 
[1] 25264 
[root@slow1host tmp]# dstat --all 
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system-- 
usr sys idl wai hiq siq| read writ| recv send| in out | int csw 
0 0 99 1 0 0| 384B 11k| 0 0 | 0 0 |1007 22 
0 2 0 98 0 0| 0 119M| 744B 306B| 0 0 |1227 145 
0 1 0 98 1 0| 0 80M| 606B 322B| 0 0 |1172 109 
0 1 0 99 0 0| 0 99M| 246B 322B| 0 0 |1216 125 
1 1 0 98 0 0| 0 92M| 426B 322B| 0 0 |1206 115 
0 1 0 98 1 0| 0 68M| 366B 322B| 0 0 |1149 95 
0 2 0 98 0 0| 0 90M| 366B 322B| 0 0 |1187 123 
0 8 0 91 1 0| 0 102M| 666B 322B| 0 0 |1205 141 
0 1 0 99 0 0| 0 88M| 246B 322B| 0 0 |1178 115 
0 1 0 99 0 0| 0 79M| 486B 322B| 0 0 |1150 105 
0 2 0 98 0 0| 0 103M| 546B 322B| 0 0 |1206 126 512+0 records in 
512+0 records out 
536870912 bytes (537 MB) copied, 10.9108 seconds, 49.2 MB/s 

0 1 24 74 1 0| 0 93M| 504B 744B| 0 0 |1187 133 
0 0 100 0 0 0| 0 0 | 546B 322B| 0 0 |1010 25 
0 0 100 0 0 0| 0 0 | 306B 322B| 0 0 |1008 25 
0 0 100 0 0 0| 0 0 | 186B 322B| 0 0 |1007 25 
0 0 100 0 0 0| 0 0 | 486B 322B| 0 0 |1009 25 
0 0 100 0 0 0| 0 56k| 306B 322B| 0 0 |1010 33 
0 0 100 0 0 0| 0 0 | 786B 644B| 0 0 |1016 21 
0 0 100 0 0 0| 0 0 | 486B 322B| 0 0 |1010 27 
0 0 100 0 0 0| 0 0 | 576B 322B| 0 0 |1011 23 
0 0 99 0 0 1| 0 0 | 906B 322B| 0 0 |1018 27 
1 0 99 0 0 0| 0 0 | 306B 322B| 0 0 |1007 25 
[1]+ Done dd if=/dev/zero of=disk-test bs=1048576 count=512 oflag=direct 




----- Original Message ----- 
From: "German Anders" <[email protected]> 
To: "Russell E. Glaue" <[email protected]> 
Cc: [email protected] 
Sent: Wednesday, April 2, 2014 3:50:26 PM 
Subject: Re: [ceph-users] write speed issue on RBD image 


Did you try those DD statements with the oflag=direct ? like: 


dd if=/dev/zero of=disk-test bs=1048576 count=512 oflag=direct; dd if=disk-test 
of=/dev/null bs=1048576 oflag=direct; /bin/rm disk-test 


In that way you are bypassing the host cache and wait for the ACK to first go 
straight to the disk and make the write. 


And see the performance numbers, if they change or not, and also the slow ones 
what are in any different. Also you could run those commands with an & at the 
last to run them on background and then immediately run a $ dstat --all to see 
how much data is send over the network in/out and how much data is write in 
disk locally. 


Hope this help, also it would be great that you could share a little bit more 
about the Switches that you are using, firmware of the HBA on the hosts, are 
they Blades or "traditional" servers?, did you use any special options when 
formatting the XFS filesystem? and/or mount options? What hypervisor are you 
using? 




Best regards, 




German Anders 
Field Storage Support Engineer 

Despegar.com - IT Team 










--- Original message --- 
Asunto: [ceph-users] write speed issue on RBD image 
De: Russell E. Glaue <[email protected]> 
Para: <[email protected]> 
Fecha: Wednesday, 02/04/2014 15:12 

Can someone recommend some testing I can do to further investigate why this 
issue with slow-disk-write in the VM OS is occurring? 
It seems the issue, details below, are perhaps related to the VM OS running on 
the RADOS images in Ceph. 


Issue: 
I have a handful (like 10) of VM's running that, when tested, report slow disk 
write speed of 8MB/s-30MB/s. All of the remaining VM's (like 40) are reporting 
fast disk write speed of average 800MB/s-1.0GB/s. There are no VMs reporting 
any disk write speeds in-between these numbers. Restarting the OS on any of the 
VMs does not resolve the issue. 

After these tests, I took one of the VMs (image02host) with slow disk write 
speed and reinstalled the basic OS, including repartitioning the disk. I used 
the same RADOS image. After this, I retested this VM (image02host) and all the 
other VMs with slow disk write speed. This VM (image02host) I reinstalled the 
OS on no longer has the slow disk write speeds any longer. And, surprisingly, 
one of the other VMs (another-host) with slow disk write speed started having 
fast write speeds. All other VMs with slow disk write speed continued the same. 

So, I do not necessarily believe the slow disk issue is directly related to any 
kind of bug or outstanding issue with Ceph/RADOS. I only have a couple guesses 
at this point: 
1. Perhaps my OS install (or possibly configuration), somehow is having issue. 
I don't see how this is possible, however. For all the VMs I have tested, they 
have all been kick-started with the same disk and OS configuration. So they are 
virtually identical, but are having either fast or slow disk write speed among 
them. 
2. Perhaps I have some bad sectors or hard drive error at the hardware level 
that is causing the issue. Perhaps the RADOS images of these handful (like 10) 
of VMs is being written across a bad part of a hard drive. This seems more 
likely to me. However, all drives across all Ceph hosts are reporting good 
health. 

So, now, I have come to the ceph-user list to ask for help. What are some 
things I can do to test if there is some, possibly, bad sector or hardware 
error on one of the hard drives, or some issue with Ceph writing to part of one 
of the hard drives? Or are there any other tests I can run to help determine 
possible issues. 

And, secondly, if I wanted to move a RADOS image to new OSD blocks, is there a 
way to do that without exporting and importing the image? Perhaps, by 
resplattering the image and testing again to see if the issue is resolved, this 
can help determine if the existing slow disk write speed issue is how the image 
is splattered across OSDs - indicating a bad OSD hard drive, or bad parts of an 
OSD hard drive. 


Ceph Configuration: 
* Ceph Version 0.72.2 
* Three Ceph hosts, CentOS 6.5 OS, using Xfs 
* All connected via 10GbE network 
* KVM/QEMU Virtualization, with Ceph support 
* Virtual Machines are all RHEL 5.9 32bit 
* Our Ceph setup is very basic. One pool for all VM disks, all drives on all 
Ceph hosts are in that pool. 
* Ceph Caching is on: 
rbd cache = true 
rbd cache size = 128 
rbd cache max dirty = 64 
rbd cache target dirty = 64 
rbd cache max dirty age = 10.0 


Test: 
Here I provide the test results of two VMs that are running on the same Ceph 
host, using disk images from the same ceph pool, and were cloned from the same 
RADOS snapshot. They both have the same exact KVM configuration. However, they 
report dramaticly different write speeds. When I tested them both, they were 
running on the same Ceph host. In fact, for the VM reporting slow disk write 
speed, I even had it run on a different Ceph host to test, and it still gave 
the same disk write speed results. 

[root@linux]# rbd -p images info osimage01 
rbd image 'osimage01': 
size 28672 MB in 7168 objects 
order 22 (4096 kB objects) 
block_name_prefix: rbd_data.2bfb74b0dc51 
format: 2 
features: layering 
[root@linux]# rbd -p images info osimage02 
rbd image 'osimage02': 
size 28672 MB in 7168 objects 
order 22 (4096 kB objects) 
block_name_prefix: rbd_data.2c1a2ae8944a 
format: 2 
features: layering 

None of the images used are cloned. 

[root@linux]# ssh image01host 
image01host [65]% dd if=/dev/zero of=disk-test bs=1048576 count=512; dd 
if=disk-test of=/dev/null bs=1048576; /bin/rm disk-test 
512+0 records in 
512+0 records out 
536870912 bytes (537 MB) copied, 0.760446 seconds, 706 MB/s 
512+0 records in 
512+0 records out 
536870912 bytes (537 MB) copied, 0.214783 seconds, 2.5 GB/s 
image01host [66]% dd if=/dev/zero of=disk-test bs=1048576 count=512; dd 
if=disk-test of=/dev/null bs=1048576; /bin/rm disk-test 
512+0 records in 
512+0 records out 
536870912 bytes (537 MB) copied, 0.514886 seconds, 1.0 GB/s 
512+0 records in 
512+0 records out 
536870912 bytes (537 MB) copied, 0.198433 seconds, 2.7 GB/s 
image01host [67]% dd if=/dev/zero of=disk-test bs=1048576 count=512; dd 
if=disk-test of=/dev/null bs=1048576; /bin/rm disk-test 
512+0 records in 
512+0 records out 
536870912 bytes (537 MB) copied, 0.562401 seconds, 955 MB/s 
512+0 records in 
512+0 records out 
536870912 bytes (537 MB) copied, 0.223297 seconds, 2.4 GB/s 

[root@linux]# ssh image02host 
image02host [66]% dd if=/dev/zero of=disk-test bs=1048576 count=512; dd 
if=disk-test of=/dev/null bs=1048576; /bin/rm disk-test 
512+0 records in 
512+0 records out 
536870912 bytes (537 MB) copied, 18.8284 seconds, 28.5 MB/s 
512+0 records in 
512+0 records out 
536870912 bytes (537 MB) copied, 0.158142 seconds, 3.4 GB/s 
image02host [67]% dd if=/dev/zero of=disk-test bs=1048576 count=512; dd 
if=disk-test of=/dev/null bs=1048576; /bin/rm disk-test 
512+0 records in 
512+0 records out 
536870912 bytes (537 MB) copied, 29.1494 seconds, 18.4 MB/s 
512+0 records in 
512+0 records out 
536870912 bytes (537 MB) copied, 0.244414 seconds, 2.2 GB/s 
image02host [68]% dd if=/dev/zero of=disk-test bs=1048576 count=512; dd 
if=disk-test of=/dev/null bs=1048576; /bin/rm disk-test 
512+0 records in 
512+0 records out 
536870912 bytes (537 MB) copied, 26.5817 seconds, 20.2 MB/s 
512+0 records in 
512+0 records out 
536870912 bytes (537 MB) copied, 0.17213 seconds, 3.1 GB/s 


((After reinstalling the OS on VM image02host using RADOS image osimage02)) 
[root@image02host tmp]# dd if=/dev/zero of=disk-test bs=1048576 count=512; dd 
if=disk-test of=/dev/null bs=1048576; /bin/rm disk-test 
512+0 records in 
512+0 records out 
536870912 bytes (537 MB) copied, 0.453372 seconds, 1.2 GB/s 
512+0 records in 
512+0 records out 
536870912 bytes (537 MB) copied, 0.145874 seconds, 3.7 GB/s 
[root@image02host tmp]# dd if=/dev/zero of=disk-test bs=1048576 count=512; dd 
if=disk-test of=/dev/null bs=1048576; /bin/rm disk-test 
512+0 records in 
512+0 records out 
536870912 bytes (537 MB) copied, 0.591697 seconds, 907 MB/s 
512+0 records in 
512+0 records out 
536870912 bytes (537 MB) copied, 0.175544 seconds, 3.1 GB/s 
[root@image02host tmp]# dd if=/dev/zero of=disk-test bs=1048576 count=512; dd 
if=disk-test of=/dev/null bs=1048576; /bin/rm disk-test 
512+0 records in 
512+0 records out 
536870912 bytes (537 MB) copied, 0.599345 seconds, 896 MB/s 
512+0 records in 
512+0 records out 
536870912 bytes (537 MB) copied, 0.164405 seconds, 3.3 GB/s 

((As mentioned, surprisingly, this other host started having fast disk write 
speeds only after image02host was reinstalled. But I am not understanding why 
this would be related.)) 
another-host [65]% dd if=/dev/zero of=disk-test bs=1048576 count=512; dd 
if=disk-test of=/dev/null bs=1048576; /bin/rm disk-test 
512+0 records in 
512+0 records out 
536870912 bytes (537 MB) copied, 7.88853 seconds, 68.1 MB/s 
512+0 records in 
512+0 records out 
536870912 bytes (537 MB) copied, 0.273677 seconds, 2.0 GB/s 
# image02host was reinstalled before the next command was issue # 
another-host [66]% dd if=/dev/zero of=disk-test bs=1048576 count=512; dd 
if=disk-test of=/dev/null bs=1048576; /bin/rm disk-test 
512+0 records in 
512+0 records out 
536870912 bytes (537 MB) copied, 0.533444 seconds, 1.0 GB/s 
512+0 records in 
512+0 records out 
536870912 bytes (537 MB) copied, 0.198121 seconds, 2.7 GB/s 



_______________________________________________ 
ceph-users mailing list 
[email protected] 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to