Hi,
I have a 24 OSDs (all SSD) in a cluster node and I have created/mapped three 
500GB images from 3 clients (1 each) with krbd. Replication factor is default 
(and changed to OSD since it is one node) .I ran the following fio script in 
parallel from 3 clients. As usual, I was seeing very dismal and bursty write 
performance (~1500K aggregated at peak).


[random-write]
ioengine=libaio
iodepth=64
filename=/dev/rbd1
rw=randwrite
bs=64k
direct=1
size=500G
numjobs=1


I was trying to analyze what is wrong with OSD and found out most of the OSDs 
with ~0 or max 1 thread running. So, it is clear that not enough data coming 
from upstream as in that case at least messenger threads would have been 
running !

Now, I tried to look 'ifstat' output and here it is for the cluster node.


       p2p1
KB/s in  KB/s out
113072.4   1298.51
100719.1   1185.05
114284.6   1324.86
178834.0   2099.69
211549.1   2376.65
29087.41    366.01
12456.08    174.72
1347.05     23.78
    1.01      3.68
    0.23      3.68
    1.08      4.43
    2.26      4.91
    0.76      3.88
69746.51    862.42
40927.77    491.73
60142.53    733.56
40593.33    500.36
50403.06    622.71
108577.1   1303.91
158618.9   1804.28
90437.46   1027.48
136244.7   1510.99
    0.54      3.95
    0.63      3.68
    0.24      3.75
    6.25      3.83
    0.74      3.68
    6.69      4.07
44616.63    547.42
63502.84    757.72
73507.45    852.72
230326.2   2528.38
157839.6   1802.30
189603.3   2122.25
82581.25    965.03
69347.60    799.37
118248.8   1368.59
70940.87    878.81
64014.78    773.66
97979.96   1134.85
150346.3   1631.18
84263.38    979.29
60342.13    730.17
156632.1   1791.12
176290.1   2062.07
120000.4   1347.99
30044.77    387.37
24333.55    324.90


So, you can see the bursty nature of the input to the cluster with periodic 
almost 0 KB input ! Highest from 3 clients you can see ~136244 KB ! Avg ~ 70MB 
and i.e ~1.5 iops !!

Now, here is the ifstat output for each client...

KB/s in  KB/s out
  439.23  37158.15
  699.95  59996.60
  898.57  80079.19
  397.90  31781.98
  324.35  27080.31
  127.05   9881.72
  244.84  20227.26
  233.70  19354.52
  338.95  27615.55
  212.83  17676.74
  458.11  39036.19
  694.03  59962.72
  479.04  41417.99
  379.50  31310.14
  403.29  34267.78
  511.71  44812.89
  370.45  32521.25
   94.49   7327.94
    1.11      0.51
    0.18      0.30
    0.63      0.30
    0.18      0.30
    1.33      1.48
    0.40      0.30
    0.40      0.30
    0.12      0.30
    5.76    336.43
  215.43  17002.90
  279.08  23719.46
   59.93   4638.77
  119.04   9073.26
    0.18      0.30
    1.99      0.30
    3.09      0.30
    0.47      0.37
    0.12      0.30
    1.39      0.30
   49.03   3831.99
  200.24  15390.87
  338.71  28017.09
  873.73  76383.56

Again, the similar pattern and avg ~say 25MB or so !!!
We can argue about increasing about num_jobs/iodepth etc. , that may improve 
the flow a bit but with the similar fio config read flow will be vastly 
different (I will post it when I will be testing RR). Lot more 'data in' to the 
cluster I would guess.

So, I think the major bottleneck for write could be in client (krbd client in 
this case) not in the OSD layer (?)..Let me know if I am missing anything.
Will fio push the data depending on how fast IO completion is happening ?

Thanks & Regards
Somnath

________________________________

PLEASE NOTE: The information contained in this electronic mail message is 
intended only for the use of the designated recipient(s) named above. If the 
reader of this message is not the intended recipient, you are hereby notified 
that you have received this message in error and that any review, 
dissemination, distribution, or copying of this message is strictly prohibited. 
If you have received this communication in error, please notify the sender by 
telephone or e-mail (as shown above) immediately and destroy any and all copies 
of this message in your possession (whether hard copies or electronically 
stored copies).

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to