Hi all,

I have some problem after my RBD performance test

Setup:
Linux kernel: 3.6.11
OS: Ubuntu 12.04
RAID card: LSI MegaRAID SAS 9260-4i  For every HDD: RAID0, Write Policy: Write 
Back with BBU, Read Policy: ReadAhead, IO Policy: Direct
Storage server number : 1
Storage server :
8 * HDD (each storage server has 8 osd, 7200 rpm, 2T)
4 * SSD (2 osd use 1 SSD as journal, the SSD divided into two partition sdx1, 
sdx2)

Ceph version : 0.56.4
Replicas : 2
Monitor number:1


The write speed of HDD:
# dd if=/dev/zero of=/dev/sdd bs=1024k count=10000 oflag=direct
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 69.3961 s, 151 MB/s

The write speed of SSD:
# dd if=/dev/zero of=/dev/sdb bs=1024k count=10000 oflag=direct
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 40.8671 s, 257 MB/s


Then we use the RADOS benchmark and collectl to observed write performance

#rados -p rbd bench 300 write -t 256

2013-04-05 14:31:13.732737min lat: 4.28207 max lat: 5.92085 avg lat: 4.78598
   sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
   300     256     16043     15787   210.455       196      5.91   4.78598
Total time run:         300.588962
Total writes made:      16043
Write size:             4194304
Bandwidth (MB/sec):     213.488

Stddev Bandwidth:       40.6795
Max bandwidth (MB/sec): 288
Min bandwidth (MB/sec): 0
Average Latency:        4.75647
Stddev Latency:         0.37182
Max latency:            5.93183
Min latency:            0.590936



collectl on OSDs :
#collectl  --iosize -sCDN --dskfilt "sd(c|d|e|f|g|h|i|j)"

# DISK STATISTICS (/sec)
#          
<---------reads---------><---------writes---------><--------averages--------> 
Pct
#Name       KBytes Merged  IOs Size  KBytes Merged  IOs Size  RWSize  QLen  
Wait SvcTim Util
sdc              0      0    0    0   76848    563  460  167     167    12    
26      0   42
sdd              0      0    0    0   45100      0  165  273     273     6    
36      1   30
sde              0      0    0    0   73800      0  270  273     273     3    
14      1   41
sdf              0      0    0    0   73800      0  270  273     273    17    
64      1   33
sdg              0      0    0    0   41000      0  150  273     273     1     
7      0   10
sdh              0      0    0    0   57400      0  210  273     273     4    
20      1   27
sdi              0      0    0    0   36904      0  136  271     271     0     
5      0    7
sdj              0      0    0    0   77776      0  285  273     272    28    
87      1   48


collectl on SSDs :
#collectl  --iosize -sCDN --dskfilt "sd(b|k|l|m)"

# DISK STATISTICS (/sec)
#          
<---------reads---------><---------writes---------><--------averages--------> 
Pct
#Name       KBytes Merged  IOs Size  KBytes Merged  IOs Size  RWSize  QLen  
Wait SvcTim Util
sdb              0      0    0    0  115552      0  388  298     297    75   
159      2   77
sdk              0      0    0    0  114592      0  389  295     294    12    
33      0   38
sdl              0      0    0    0  100364      0  334  300     300    35   
148      2   69
sdm              0      0    0    0  101644      0  345  295     294   245   
583      2   99 <= almost 99%



My question is:
1.The rados benchmark write is a random write right?

2.Why the bottleneck of write bandwidth occur at 213MB/s even if increased the 
concurrent (-t 512) ?
  It looks a bit worse, because the collectl show SSD's write throughput only 
has 100M~120M, but SSD should be able to 250MB/s

3.Why some SSD (sdm) [Util] almost 99% that means data written to osd not 
enough distributed ?

4.If bottleneck of write performance not SSD , What it should be write 
bottleneck ?

5.How can I improve write performance ?

Thanks!!

- Kelvin

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to