Hi,

Could you take a look on my problem.
It's about high latency on my OSDs on HP G8 servers (ceph01, ceph02 and ceph03).
When I run a rados bench for 60 sec, the results are surprising : after a few 
seconds,  there is no traffic, then it's resume, etc.
Finally, the maximum latency is high and VM's disks freeze lot.

#rados bench -p pool-test-g8 60 write
Maintaining 16 concurrent writes of 4194304 bytes for up to 60 seconds or 0 
objects
Object prefix: benchmark_data_ceph02_56745
   sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
     0       0         0         0         0         0         -         0
     1      16        82        66   263.959       264 0.0549584  0.171148
     2      16       134       118    235.97       208  0.344873  0.232103
     3      16       189       173   230.639       220  0.015583   0.24581
     4      16       248       232   231.973       236 0.0704699  0.252504
     5      16       306       290   231.974       232 0.0229872  0.258343
     6      16       371       355    236.64       260   0.27183  0.255469
    7      16       419       403    230.26       192 0.0503492  0.263304
     8      16       460       444   221.975       164 0.0157241  0.261779
     9      16       506       490   217.754       184  0.199418  0.271501
    10      16       518       502   200.778        48 0.0472324  0.269049
    11      16       518       502   182.526         0         -  0.269049
    12      16       556       540   179.981        76  0.100336  0.301616
    13      16       607       591   181.827       204  0.173912  0.346105
    14      16       655       639   182.552       192 0.0484904  0.339879
    15      16       683       667   177.848       112 0.0504184  0.349929
    16      16       746       730   182.481       252  0.276635  0.347231
    17      16       807       791   186.098       244  0.391491  0.339275
    18      16       845       829   184.203       152  0.188608  0.342021
    19      16       850       834   175.561        20  0.960175  0.342717
2015-05-28 17:09:48.397376min lat: 0.013532 max lat: 6.28387 avg lat: 0.346987
   sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
    20      16       859       843   168.582        36 0.0182246  0.346987
    21      16       863       847   161.316        16   3.18544  0.355051
    22      16       897       881   160.165       136 0.0811037  0.371209
    23      16       901       885   153.897        16 0.0482124  0.370793
    24      16       943       927   154.484       168   0.63064  0.397204
    25      15       997       982   157.104       220 0.0933448  0.392701
    26      16      1058      1042   160.291       240  0.166463  0.385943
    27      16      1088      1072   158.798       120   1.63882  0.388568
    28      16      1125      1109   158.412       148 0.0511479   0.38419
    29      16      1155      1139   157.087       120  0.162266  0.385898
    30      16      1163      1147   152.917        32 0.0682181  0.383571
    31      16      1190      1174   151.468       108 0.0489185  0.386665
    32      16      1196      1180   147.485        24   2.95263  0.390657
    33      16      1213      1197   145.076        68 0.0467788  0.389299
    34      16      1265      1249   146.926       208 0.0153085  0.420687
    35      16      1332      1316   150.384       268 0.0157061   0.42259
    36      16      1374      1358   150.873       168  0.251626  0.417373
    37      16      1402      1386   149.822       112 0.0475302  0.413886
    38      16      1444      1428     150.3       168 0.0507577  0.421055
    39      16      1500      1484   152.189       224 0.0489163  0.416872
2015-05-28 17:10:08.399434min lat: 0.013532 max lat: 9.26596 avg lat: 0.415296
   sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
   40      16      1530      1514   151.384       120  0.951713  0.415296
    41      16      1551      1535   149.741        84 0.0686787  0.416571
    42      16      1606      1590   151.413       220 0.0826855   0.41684
    43      16      1656      1640   152.542       200 0.0706539  0.409974
    44      16      1663      1647   149.712        28  0.046672  0.408476
    45      16      1685      1669    148.34        88 0.0989566  0.424918
    46      16      1707      1691   147.028        88 0.0490569  0.421116
    47      16      1707      1691     143.9         0         -  0.421116
    48      16      1707      1691   140.902         0         -  0.421116
    49      16      1720      1704   139.088   17.3333 0.0480335  0.428997
    50      16      1752      1736   138.866       128  0.053219    0.4416
    51      16      1786      1770   138.809       136  0.602946  0.440357
    52      16      1810      1794   137.986        96 0.0472518  0.438376
    53      16      1831      1815   136.967        84 0.0148999  0.446801
    54      16      1831      1815    134.43         0         -  0.446801
    55      16      1853      1837   133.586        44 0.0499486  0.455561
    56      16      1898      1882   134.415       180 0.0566593  0.461019
    57      16      1932      1916   134.442       136 0.0162902  0.454385
    58      16      1948      1932   133.227        64   0.62188  0.464403
    59      16      1966      1950    132.19        72  0.563613  0.472147
2015-05-28 17:10:28.401525min lat: 0.013532 max lat: 12.4828 avg lat: 0.472084
   sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
    60      16      1983      1967    131.12        68  0.030789  0.472084
    61      16      1984      1968   129.036         4 0.0519125  0.471871
    62      16      1984      1968   126.955         0         -  0.471871
    63      16      1984      1968   124.939         0         -  0.471871
    64      14      1984      1970   123.112   2.66667   4.20878  0.476035
Total time run:         64.823355
Total writes made:      1984
Write size:             4194304
Bandwidth (MB/sec):     122.425
Stddev Bandwidth:       85.3816
Max bandwidth (MB/sec): 268
Min bandwidth (MB/sec): 0
Average Latency:        0.520956
Stddev Latency:         1.17678
Max latency:            12.4828
Min latency:            0.013532


I have installed a new ceph06 box which has best latencies but hardware is 
different (RAID card, disks, ...).
All OSDs are formatted in XFS but are mounted differently :

-          On ceph box with high latency (strip size of raid logical disk is 
256KB) :
/dev/sdd1 on /var/lib/ceph/osd/ceph-4 type xfs 
(rw,noatime,attr2,inode64,logbsize=256k,sunit=512,swidth=512,noquota)

-          On ceph box with low latency (strip size of raid logical disk is 
128KB) :

/dev/sdc1 on /var/lib/ceph/osd/ceph-9 type xfs 
(rw,noatime,attr2,inode64,noquota)

Thread on Proxmox Forum :  
http://forum.proxmox.com/threads/22206-Ceph-High-apply-latency-on-OSD-causes-poor-performance-on-VM

If you have any idea ...

Thanks.

Franck ALLOUIS
[email protected]<mailto:[email protected]>


De : Franck Allouis
Envoyé : jeudi 21 mai 2015 17:12
À : '[email protected]'
Objet : High apply latency on OSD causes poor performance on VM

Hi,

Since we have installing our new Ceph Cluster, we have frequently high apply 
latency on OSDs (near 200 ms to 1500 ms), while commit latency is continuously 
at 0 ms !

In Ceph documentation, when you run the command "ceph osd perf", the 
fs_commit_latency is generally higher than fs_apply_latency. For us it's the 
opposite.
The phenomenon has increased since we changed the Ceph version (migrate Giant 
0.87.1 to Hammer 0.94.1)
The consequence is that our Windows VMs are very slow.
Does anyone could tell us if our configuration is good or not, and in what 
direction investigate ?

# ceph osd perf
osd fs_commit_latency(ms) fs_apply_latency(ms)
  0                     0                   62
  1                     0                  193
  2                     0                   88
  3                     0                  269
  4                     0                 1055
  5                     0                  322
  6                     0                  272
  7                     0                  116
  8                     0                  653
  9                     0                    4
10                     0                    1
11                     0                    7
12                     0                    4

Different informations in our configuration :

- Proxmox 3.4-6
- kernel : 3.10.0-10-pve
- CEPH :
- Hammer 0.94.1
- 3 hosts with 3 OSDs of 4 TB (9 OSDs) + 1 SSD of 500 GB per host for journals
- 1 host with 4 OSDs of 300 GB (4 OSDs) + 1 SSD of 500 GB for journals

- OSD Tree :

# ceph osd tree
ID WEIGHT   TYPE NAME           UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 33.83995 root default
-6 22.91995     room salle-dr
-2 10.92000         host ceph01
0  3.64000             osd.0        up  1.00000          1.00000
2  3.64000             osd.2        up  1.00000          1.00000
1  3.64000             osd.1        up  1.00000          1.00000
-3 10.92000         host ceph02
3  3.64000             osd.3        up  1.00000          1.00000
4  3.64000             osd.4        up  1.00000          1.00000
5  3.64000             osd.5        up  1.00000          1.00000
-5  1.07996         host ceph06
9  0.26999             osd.9        up  1.00000          1.00000
10  0.26999             osd.10       up  1.00000          1.00000
11  0.26999             osd.11       up  1.00000          1.00000
12  0.26999             osd.12       up  1.00000          1.00000
-7 10.92000     room salle-log
-4 10.92000         host ceph03
6  3.64000             osd.6        up  1.00000          1.00000
7  3.64000             osd.7        up  1.00000          1.00000
8  3.64000             osd.8        up  1.00000          1.00000

- ceph.conf :

[global]
         auth client required = cephx
         auth cluster required = cephx
         auth service required = cephx
         auth supported = cephx
         cluster network = 10.10.1.0/24
         filestore xattr use omap = true
         fsid = 2dbbec32-a464-4bc5-bb2b-983695d1d0c6
         keyring = /etc/pve/priv/$cluster.$name.keyring
         mon osd adjust heartbeat grace = true
         mon osd down out subtree limit = host
         osd disk threads = 24
         osd heartbeat grace = 10
         osd journal size = 5120
         osd max backfills = 1
         osd op threads = 24
         osd pool default min size = 1
         osd recovery max active = 1
         public network = 192.168.80.0/24
[osd]
         keyring = /var/lib/ceph/osd/ceph-$id/keyring
[mon.0]
         host = ceph01
         mon addr = 192.168.80.41:6789
[mon.1]
         host = ceph02
         mon addr = 192.168.80.42:6789
[mon.2]
         host = ceph03
         mon addr = 192.168.80.43:6789

Thanks.
Best regards

Franck ALLOUIS
[email protected]<mailto:[email protected]>



NOTICE: This e-mail (including any attachments) may contain information that is 
private, confidential or legally privileged information or material and is 
intended solely for the use of the addressee(s). If you receive this e-mail in 
error, please delete it from your system without copying it and immediately 
notify the sender(s) by reply e-mail. Any unauthorized use or disclosure of 
this message is strictly prohibited. STEF does not guarantee the integrity of 
this transmission and may therefore never be liable if the message is altered 
or falsified nor for any virus, interception or damage to your system.

AVIS : Ce message (y compris toutes pièces jointes) peut contenir des 
informations privées, confidentielles et est pour l'usage du(es) seul(s) 
destinataire(s). Si vous avez reçu ce message par erreur, merci d'en avertir 
l'expéditeur par retour d'email immédiatement et de procéder à la destruction 
de l'ensemble des éléments reçus, dont vous ne devez garder aucune copie. Toute 
diffusion, utilisation ou copie de ce message ou des renseignements qu'il 
contient par une personne autre que le(les) destinataire(s) désigné(s) est 
interdite. STEF ne garantit pas l'intégrité de cette transmission et ne saurait 
être tenu responsable du message, de son contenu, de toute modification ou 
falsification, d’une interception ou de dégâts à votre système.
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to