Hi Jakub,

Le 06/02/2018 à 16:03, Jakub Jaszewski a écrit :
​Hi Frederic,

I've not enable debug level logging on all OSDs, just on one for the test, need to double check that. But looks that merging is ongoing on few OSDs or OSDs are faulty, I will dig into that tomorrow.
Write bandwidth is very random

I just reread the whole thread:

- Splitting is not happening anymore - if it ever did - that's for sure.
- Regarding the write bandwidth variations, it seems that these variations only concern EC 6+3 pools. - As you get more than a 1.2 GB/s on replicated pools with 4MB iops, I would think that neither NVMe, nor PERC or HDDs is to blame.

Did you check CPU load during EC 6+3 writes on pool default.rgw.buckets.data ?

If you don't see any 100% CPU load, nor any 100% iostat issues on either the NVMe disk or HDDs, then I would benchmark the network for bandwidth or latency issues.

BTW, did you see that some of your OSDs were not tagged as 'hdd' (ceph osd df tree).



# rados bench -p default.rgw.buckets.data 120 write
hints = 1
Maintaining 16 concurrent writes of 4194432 bytes to objects of size 4194432 for up to 120 seconds or 0 objects
Object prefix: benchmark_data_sg08-09_59104
sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
0       0         0         0         0         0  -           0
1      16       155       139    555.93   556.017  0.0750027     0.10687
2      16       264       248   495.936   436.013 0.154185    0.118693
3      16       330       314   418.616   264.008 0.118476    0.142667
4      16       415       399   398.953    340.01  0.0873379     0.15102
5      16       483       467   373.557   272.008 0.750453    0.159819
6      16       532       516   343.962   196.006  0.0298334    0.171218
7      16       617       601   343.391    340.01 0.192698    0.177288
8      16       700       684   341.963    332.01  0.0281355    0.171277
9      16       762       746   331.521   248.008  0.0962037    0.163734
 10      16       804       788   315.167   168.005  1.40356    0.196298
 11      16       897       881    320.33   372.011  0.0369085     0.19496
 12      16       985       969   322.966   352.011  0.0290563    0.193986
 13      15      1106      1091   335.657   488.015  0.0617642    0.188703
 14      16      1166      1150   328.537   236.007  0.0401884    0.186206
 15      16      1251      1235   329.299    340.01 0.171256    0.190974
 16      16      1339      1323   330.716   352.011 0.024222    0.189901
 17      16      1417      1401   329.613    312.01  0.0289473    0.186562
 18      16      1465      1449   321.967   192.006 0.028123    0.189153
 19      16      1522      1506    317.02   228.007 0.265448    0.188288
2018-02-06 13:43:21.412512 min lat: 0.0204657 max lat: 3.61509 avg lat: 0.18918 sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
 20      16      1564      1548   309.568   168.005  0.0327581     0.18918
 21      16      1636      1620    308.54   288.009  0.0715159    0.187381
 22      16      1673      1657   301.242   148.005  1.57285    0.191596
 23      16      1762      1746   303.621   356.011  6.00352    0.206217
 24      16      1885      1869   311.468   492.015  0.0298435    0.203874
 25      16      2010      1994   319.008   500.015  0.0258761    0.199652
 26      16      2116      2100   323.044   424.013  0.0533319     0.19631
 27      16      2201      2185    323.67    340.01 0.134796    0.195953
 28      16      2257      2241    320.11   224.007 0.473629    0.196464
 29      16      2333      2317   319.554   304.009  0.0362741    0.198054
 30      16      2371      2355   313.968   152.005 0.438141    0.200265
 31      16      2459      2443   315.194   352.011  0.0610629    0.200858
 32      16      2525      2509   313.593   264.008  0.0234799    0.201008
 33      16      2612      2596   314.635   348.011 0.072019    0.199094
 34      16      2682      2666   313.615   280.009  0.10062    0.197586
 35      16      2757      2741   313.225   300.009  0.0552581    0.196981
 36      16      2849      2833   314.746   368.011 0.257323     0.19565
 37      16      2891      2875   310.779   168.005  0.0918386     0.19556
 38      16      2946      2930    308.39   220.007  0.0276621    0.195792
 39      16      2975      2959   303.456   116.004  0.0588971     0.19952
2018-02-06 13:43:41.415107 min lat: 0.0204657 max lat: 7.9873 avg lat: 0.198749 sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
 40      16      3060      3044   304.369    340.01  0.0217136    0.198749
 41      16      3098      3082   300.652   152.005  0.0717398    0.199052
 42      16      3141      3125   297.589   172.005  0.0257422    0.201899
 43      15      3241      3226   300.063   404.012  0.0733869    0.209446
 44      16      3332      3316   301.424   360.011  0.0327249    0.206686
 45      16      3430      3414   303.436   392.012  0.0413156    0.203727
 46      16      3534      3518   305.882   416.013 0.033638    0.202182
 47      16      3602      3586   305.161   272.008  0.0453557    0.200028
 48      16      3663      3647   303.886   244.007  0.0779019    0.199777
 49      16      3736      3720   303.643   292.009  0.0285231    0.206274
 50      16      3849      3833   306.609   452.014  0.0537071    0.208127
 51      16      3909      3893   305.303   240.007  0.0366709    0.207793
 52      16      3972      3956   304.277   252.008  0.0289131    0.207989
 53      16      4048      4032   304.272   304.009  0.0348617    0.207844
 54      16      4114      4098   303.525   264.008  0.0799526     0.20701
 55      16      4191      4175   303.606   308.009 0.034915    0.206882
 56      16      4313      4297   306.898   488.015 0.108777    0.205869
 57      16      4449      4433   311.057   544.017  0.0862092    0.205232
 58      16      4548      4532    312.52   396.012 0.135814    0.203753
 59      16      4678      4662   316.036   520.016  0.0307446    0.202156
2018-02-06 13:44:01.417687 min lat: 0.0204657 max lat: 13.0888 avg lat: 0.201848 sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
 60      16      4734      4718   314.502   224.007 0.223548    0.201848
 61      16      4783      4767   312.559   196.006 0.124237    0.203641
 62      16      4838      4822   311.066   220.007 0.270943    0.204074
 63      16      4929      4913   311.905   364.011 0.123486    0.204716
 64      16      5063      5047   315.406   536.016 0.101675    0.202283
 65      16      5124      5108   314.307   244.007  0.0504368    0.201181
 66      16      5243      5227   316.756   476.015 0.235348    0.201481
 67      16      5339      5323   317.759   384.012  0.0478478    0.200486
 68      16      5414      5398   317.497   300.009 0.258401    0.200194
 69      16      5461      5445    315.62   188.006 0.112516     0.20022
 70      16      5523      5507   314.654   248.008  0.0927405    0.201534
 71      16      5582      5566   313.546   236.007 0.333586    0.202304
 72      16      5690      5674    315.19   432.013  0.0706812    0.201498
 73      16      5780      5764   315.804   360.011  0.0306772    0.200354
 74      16      5850      5834    315.32   280.009  0.0261027    0.200627
 75      16      5905      5889   314.048   220.007 0.101282     0.20075
 76      16      6013      5997     315.6   432.013 0.161956    0.202154
 77      16      6130      6114   317.578   468.014 0.042322     0.20092
 78      16      6232      6216   318.737   408.012  0.0912166    0.199238
 79      16      6260      6244    316.12   112.003  0.0406156    0.198971
2018-02-06 13:44:21.420343 min lat: 0.0204657 max lat: 13.0888 avg lat: 0.200443 sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
 80      16      6304      6288   314.368   176.005  0.0409417    0.200443
 81      16      6375      6359   313.993   284.009 0.150617     0.19986
 82      16      6447      6431   313.676   288.009  0.0730616    0.200072
 83      16      6513      6497   313.077   264.008  0.0565517    0.202155
 84      16      6594      6578   313.206    324.01  0.0273074    0.202981
 85      16      6732      6716   316.015   552.017 0.134015    0.202301
 86      16      6878      6862   319.131   584.018 0.157684    0.200182
 87      16      6971      6955   319.738   372.011 0.327923    0.199431
 88      16      7018      7002   318.241   188.006  0.0317034    0.198737
 89      16      7056      7040   316.373   152.005  0.0874119    0.199579
 90      16      7095      7079    314.59   156.005 0.027061    0.198883
 91      16      7150      7134   313.551   220.007 0.247791    0.199385
 92      16      7201      7185    312.36   204.006 0.131592    0.202395
 93      16      7216      7200   309.646   60.0018  0.0678182    0.202927
 94      16      7276      7260   308.905   240.007 0.040735    0.204362
 95      16      7304      7288   306.832   112.003  0.04629    0.204849
 96      16      7402      7386   307.719   392.012 0.085536    0.206601
 97      16      7524      7508   309.577   488.015 0.904426    0.205316
 98      16      7682      7666   312.866   632.019  0.0580214    0.204241
 99      16      7840      7824   316.089   632.019  0.13412    0.202289
2018-02-06 13:44:41.423052 min lat: 0.0204657 max lat: 13.0888 avg lat: 0.200948 sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
100      16      7949      7933   317.288   436.013 0.025568    0.200948
101      16      7996      7980   316.007   188.006 0.328559    0.201496
102      16      8020      8004    313.85   96.0029  0.0524293     0.20214
103      16      8044      8028   311.735   96.0029  0.0628428     0.20236
104      16      8062      8046    309.43   72.0022  0.0320078    0.202743
105      16      8088      8072   307.473   104.003  0.0506497    0.204222
106      16      8127      8111   306.044   156.005  0.0436112     0.20792
107      16      8226      8210   306.885   396.012  0.0452614    0.207666
108      16      8296      8280   306.635   280.009  0.0500199    0.207578
109      16      8367      8351   306.428   284.009  0.0364288    0.207779
110      16      8475      8459   307.569   432.013 0.140141    0.206752
111      16      8559      8543   307.825    336.01  0.0250007    0.206032
112      16      8696      8680   309.968   548.017  0.0249451    0.205808
113      16      8804      8788   311.048   432.013  0.49485    0.205075
114      16      8917      8901   312.284   452.014  0.0731665    0.204527
115      16      9047      9031    314.09   520.016  0.0365535    0.203553
116      16      9130      9114   314.244    332.01  0.0451302    0.203367
117      16      9259      9243   315.968   516.016  0.0610521    0.202253
118      16      9373      9357   317.154   456.014  0.0282051    0.201613
119      16      9465      9449   317.581   368.011  0.0299575    0.200845
2018-02-06 13:45:01.425689 min lat: 0.0204657 max lat: 13.0888 avg lat: 0.200298 sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
120      16      9545      9529   317.601    320.01  0.0264095    0.200298
121       7      9546      9539   315.307   40.0012 0.659852    0.201486
122       7      9546      9539   312.722         0  -    0.201486
Total time run:         122.442935
Total writes made:      9546
Write size:             4194432
Object size:            4194432
Bandwidth (MB/sec):     311.861
Stddev Bandwidth:       133.889
Max bandwidth (MB/sec): 632.019
Min bandwidth (MB/sec): 0
Average IOPS:           77
Stddev IOPS:            33
Max IOPS:               158
Min IOPS:               0
Average Latency(s):     0.203467
Stddev Latency(s):      0.531046
Max latency(s):         13.0888
Min latency(s):         0.0204657
Cleaning up (deleting benchmark objects)
Removed 9546 objects
Clean up completed and total clean up time :308.444893
#

Many slow requests occurred in log once Cleaning up started. Affected OSDs: osd.53 and osd.91 are located on different nodes.

Slow requests on cleanup is surely due to merging.


2018-02-06 13:42:49.881647 mon.sg01-06 mon.0 10.212.32.18:6789/0 <http://10.212.32.18:6789/0> 3574 : cluster [WRN] overall HEALTH_WARN noscrub,nodeep-scrub flag(s) set 2018-02-06 13:44:49.777466 mon.sg01-06 mon.0 10.212.32.18:6789/0 <http://10.212.32.18:6789/0> 3577 : cluster [INF] mon.1 10.212.32.19:6789/0 <http://10.212.32.19:6789/0> 2018-02-06 13:44:49.777627 mon.sg01-06 mon.0 10.212.32.18:6789/0 <http://10.212.32.18:6789/0> 3578 : cluster [INF] mon.2 10.212.32.20:6789/0 <http://10.212.32.20:6789/0> 2018-02-06 13:45:22.221472 mon.sg01-06 mon.0 10.212.32.18:6789/0 <http://10.212.32.18:6789/0> 3579 : cluster [WRN] Health check failed: 4 slow requests are blocked > 32 sec (REQUEST_SLOW) 2018-02-06 13:45:26.227333 mon.sg01-06 mon.0 10.212.32.18:6789/0 <http://10.212.32.18:6789/0> 3580 : cluster [INF] Health check cleared: REQUEST_SLOW (was: 4 slow requests are blocked > 32 sec) 2018-02-06 13:45:40.325885 mon.sg01-06 mon.0 10.212.32.18:6789/0 <http://10.212.32.18:6789/0> 3582 : cluster [WRN] Health check failed: 26 slow requests are blocked > 32 sec (REQUEST_SLOW) 2018-02-06 13:45:46.360383 mon.sg01-06 mon.0 10.212.32.18:6789/0 <http://10.212.32.18:6789/0> 3583 : cluster [INF] Health check cleared: REQUEST_SLOW (was: 26 slow requests are blocked > 32 sec) 2018-02-06 13:46:08.518861 mon.sg01-06 mon.0 10.212.32.18:6789/0 <http://10.212.32.18:6789/0> 3584 : cluster [WRN] Health check failed: 1 slow requests are blocked > 32 sec (REQUEST_SLOW) 2018-02-06 13:46:13.806928 mon.sg01-06 mon.0 10.212.32.18:6789/0 <http://10.212.32.18:6789/0> 3585 : cluster [WRN] Health check update: 9 slow requests are blocked > 32 sec (REQUEST_SLOW) 2018-02-06 13:46:18.807371 mon.sg01-06 mon.0 10.212.32.18:6789/0 <http://10.212.32.18:6789/0> 3586 : cluster [WRN] Health check update: 20 slow requests are blocked > 32 sec (REQUEST_SLOW) 2018-02-06 13:46:23.807835 mon.sg01-06 mon.0 10.212.32.18:6789/0 <http://10.212.32.18:6789/0> 3587 : cluster [WRN] Health check update: 15 slow requests are blocked > 32 sec (REQUEST_SLOW) 2018-02-06 13:46:19.082992 osd.53 osd.53 10.212.32.22:6802/50082 <http://10.212.32.22:6802/50082> 167 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.071323 secs 2018-02-06 13:46:19.082999 osd.53 osd.53 10.212.32.22:6802/50082 <http://10.212.32.22:6802/50082> 168 : cluster [WRN] slow request 30.071323 seconds old, received at 2018-02-06 13:45:49.011626: osd_op(client.10075.0:25660466 3.50964d18 3:18b2690a:::rbd_data.bcc718ab7e41.000000000001018d:head [write 2703360~8192] snapc 0=[] ack+ondisk+write+known_if_redirected e5954) currently journaled_completion_queued 2018-02-06 13:46:21.083367 osd.53 osd.53 10.212.32.22:6802/50082 <http://10.212.32.22:6802/50082> 169 : cluster [WRN] 2 slow requests, 1 included below; oldest blocked for > 32.071703 secs 2018-02-06 13:46:21.083390 osd.53 osd.53 10.212.32.22:6802/50082 <http://10.212.32.22:6802/50082> 170 : cluster [WRN] slow request 30.141592 seconds old, received at 2018-02-06 13:45:50.941737: osd_op(client.841436.0:77497677 3.52e3e8ff 3:ff17c74a:::rbd_data.cd25332916c98.0000000000012b0b:head [write 245760~3612672] snapc 0=[] ack+ondisk+write+known_if_redirected e5954) currently commit_sent 2018-02-06 13:46:28.808214 mon.sg01-06 mon.0 10.212.32.18:6789/0 <http://10.212.32.18:6789/0> 3589 : cluster [WRN] Health check update: 16 slow requests are blocked > 32 sec (REQUEST_SLOW) 2018-02-06 13:46:33.808664 mon.sg01-06 mon.0 10.212.32.18:6789/0 <http://10.212.32.18:6789/0> 3590 : cluster [WRN] Health check update: 17 slow requests are blocked > 32 sec (REQUEST_SLOW) 2018-02-06 13:46:38.809093 mon.sg01-06 mon.0 10.212.32.18:6789/0 <http://10.212.32.18:6789/0> 3591 : cluster [WRN] Health check update: 1 slow requests are blocked > 32 sec (REQUEST_SLOW) 2018-02-06 13:46:43.809537 mon.sg01-06 mon.0 10.212.32.18:6789/0 <http://10.212.32.18:6789/0> 3592 : cluster [WRN] Health check update: 2 slow requests are blocked > 32 sec (REQUEST_SLOW) 2018-02-06 13:46:44.760767 mon.sg01-06 mon.0 10.212.32.18:6789/0 <http://10.212.32.18:6789/0> 3593 : cluster [INF] Health check cleared: REQUEST_SLOW (was: 2 slow requests are blocked > 32 sec) 2018-02-06 13:46:52.819718 mon.sg01-06 mon.0 10.212.32.18:6789/0 <http://10.212.32.18:6789/0> 3594 : cluster [WRN] Health check failed: 70 slow requests are blocked > 32 sec (REQUEST_SLOW) 2018-02-06 13:46:59.674390 mon.sg01-06 mon.0 10.212.32.18:6789/0 <http://10.212.32.18:6789/0> 3595 : cluster [WRN] Health check update: 105 slow requests are blocked > 32 sec (REQUEST_SLOW) 2018-02-06 13:47:01.444356 osd.91 osd.91 10.212.32.25:6818/81813 <http://10.212.32.25:6818/81813> 5 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.028512 secs 2018-02-06 13:47:01.444365 osd.91 osd.91 10.212.32.25:6818/81813 <http://10.212.32.25:6818/81813> 6 : cluster [WRN] slow request 30.028512 seconds old, received at 2018-02-06 13:46:31.415754: osd_op(client.757449.0:155792306 3.92bf145a 3:5a28fd49:::rbd_data.8e125f170c6e.00000000000318b3:head [write 176128~4096] snapc 0=[] ack+ondisk+write+known_if_redirected e5954) currently sub_op_commit_rec from 72 2018-02-06 13:47:02.444612 osd.91 osd.91 10.212.32.25:6818/81813 <http://10.212.32.25:6818/81813> 7 : cluster [WRN] 12 slow requests, 5 included below; oldest blocked for > 31.028754 secs 2018-02-06 13:47:02.444637 osd.91 osd.91 10.212.32.25:6818/81813 <http://10.212.32.25:6818/81813> 12 : cluster [WRN] slow request 30.121723 seconds old, received at 2018-02-06 13:46:32.322785: osd_op(client.757449.0:155792349 3.92bf145a 3:5a28fd49:::rbd_data.8e125f170c6e.00000000000318b3:head [write 217088~4096] snapc 0=[] ack+ondisk+write+known_if_redirected e5954) currently sub_op_commit_rec from 72 2018-02-06 13:47:01.089688 osd.53 osd.53 10.212.32.22:6802/50082 <http://10.212.32.22:6802/50082> 171 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.546028 secs 2018-02-06 13:47:01.089695 osd.53 osd.53 10.212.32.22:6802/50082 <http://10.212.32.22:6802/50082> 172 : cluster [WRN] slow request 30.546028 seconds old, received at 2018-02-06 13:46:30.543617: osd_op(client.48366.0:29692284 3.859d4e2b 3.859d4e2b (undecoded) ack+ondisk+write+known_if_redirected e5954) currently queued_for_pg 2018-02-06 13:47:02.089919 osd.53 osd.53 10.212.32.22:6802/50082 <http://10.212.32.22:6802/50082> 173 : cluster [WRN] 9 slow requests, 5 included below; oldest blocked for > 31.546245 secs 2018-02-06 13:47:02.089926 osd.53 osd.53 10.212.32.22:6802/50082 <http://10.212.32.22:6802/50082> 174 : cluster [WRN] slow request 30.414557 seconds old, received at 2018-02-06 13:46:31.675305: osd_repop(client.757449.0:155792317 3.5a e5954/5842) currently queued_for_pg 2018-02-06 13:47:04.445182 osd.91 osd.91 10.212.32.25:6818/81813 <http://10.212.32.25:6818/81813> 24 : cluster [WRN] slow request 30.496769 seconds old, received at 2018-02-06 13:46:33.948299: osd_op(client.757449.0:155792384 3.92bf145a 3:5a28fd49:::rbd_data.8e125f170c6e.00000000000318b3:head [write 262144~4096] snapc 0=[] ack+ondisk+write+known_if_redirected e5954) currently sub_op_commit_rec from 72 2018-02-06 13:50:11.133054 osd.53 osd.53 10.212.32.22:6802/50082 <http://10.212.32.22:6802/50082> 864 : cluster [WRN] slow request 120.604868 seconds old, received at 2018-02-06 13:48:10.528083: osd_op(client.840950.0:169909382 3.cf2d18ff 3.cf2d18ff (undecoded) ack+ondisk+write+known_if_redirected e5954) currently queued_for_pg 2018-02-06 13:50:11.133058 osd.53 osd.53 10.212.32.22:6802/50082 <http://10.212.32.22:6802/50082> 865 : cluster [WRN] slow request 61.985051 seconds old, received at 2018-02-06 13:49:09.147900: osd_op(client.749595.0:78222152 3.4347d828 3.4347d828 (undecoded) ack+ondisk+write+known_if_redirected e5954) currently queued_for_pg 2018-02-06 13:50:11.133061 osd.53 osd.53 10.212.32.22:6802/50082 <http://10.212.32.22:6802/50082> 866 : cluster [WRN] slow request 31.256269 seconds old, received at 2018-02-06 13:49:39.876682: osd_op(client.749595.0:78223068 3.4347d828 3.4347d828 (undecoded) ack+ondisk+write+known_if_redirected e5954) currently queued_for_pg 2018-02-06 13:50:12.133218 osd.53 osd.53 10.212.32.22:6802/50082 <http://10.212.32.22:6802/50082> 867 : cluster [WRN] 48 slow requests, 1 included below; oldest blocked for > 168.672302 secs 2018-02-06 13:50:12.133223 osd.53 osd.53 10.212.32.22:6802/50082 <http://10.212.32.22:6802/50082> 868 : cluster [WRN] slow request 34.801266 seconds old, received at 2018-02-06 13:49:37.331905: osd_op(client.48192.0:97243662 3.7163d828 3.7163d828 (undecoded) ack+ondisk+write+known_if_redirected e5954) currently queued_for_pg 2018-02-06 13:50:18.139428 mon.sg01-06 mon.0 10.212.32.18:6789/0 <http://10.212.32.18:6789/0> 3639 : cluster [INF] Health check cleared: REQUEST_SLOW (was: 247 slow requests are blocked > 32 sec) 2018-02-06 13:50:49.882654 mon.sg01-06 mon.0 10.212.32.18:6789/0 <http://10.212.32.18:6789/0> 3641 : cluster [WRN] overall HEALTH_WARN noscrub,nodeep-scrub flag(s) set

Testing with dd

# dd if=/dev/zero of=/var/lib/ceph/osd/ceph-53/dd_test bs=4M count=1000 oflag=direct
1000+0 records in
1000+0 records out
4194304000 bytes (4,2 GB, 3,9 GiB) copied, 29,3667 s, 143 MB/s
#

# dd if=/dev/zero of=/var/lib/ceph/osd/ceph-91/dd_test bs=4M count=1000 oflag=direct
1000+0 records in
1000+0 records out
4194304000 bytes (4,2 GB, 3,9 GiB) copied, 33,4732 s, 125 MB/s
#

Device Model:     TOSHIBA MG04ACA400N
Firmware Version: FJ2D
User Capacity:    4,000,787,030,016 bytes [4,00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)


Please see some lines with number of files per directory in affected filestore. I think directories with 300+ files look like just merged or I get it wrong ?


That's probably right, but now that there's a random factor, you may have had more than 320 files per subdirectory with previous defaults split and merge values.

But yeah, the blocked requests surely come from merging operations.

...
337 in /var/lib/ceph/osd/ceph-91/current/20.83s4_head/DIR_3/DIR_8/DIR_4/DIR_E 21 in /var/lib/ceph/osd/ceph-91/current/20.83s4_head/DIR_3/DIR_8/DIR_4/DIR_F/DIR_0 18 in /var/lib/ceph/osd/ceph-91/current/20.83s4_head/DIR_3/DIR_8/DIR_4/DIR_F/DIR_1 21 in /var/lib/ceph/osd/ceph-91/current/20.83s4_head/DIR_3/DIR_8/DIR_4/DIR_F/DIR_6 25 in /var/lib/ceph/osd/ceph-91/current/20.83s4_head/DIR_3/DIR_8/DIR_4/DIR_F/DIR_2 18 in /var/lib/ceph/osd/ceph-91/current/20.83s4_head/DIR_3/DIR_8/DIR_4/DIR_F/DIR_3 22 in /var/lib/ceph/osd/ceph-91/current/20.83s4_head/DIR_3/DIR_8/DIR_4/DIR_F/DIR_4 20 in /var/lib/ceph/osd/ceph-91/current/20.83s4_head/DIR_3/DIR_8/DIR_4/DIR_F/DIR_5 27 in /var/lib/ceph/osd/ceph-91/current/20.83s4_head/DIR_3/DIR_8/DIR_4/DIR_F/DIR_7 22 in /var/lib/ceph/osd/ceph-91/current/20.83s4_head/DIR_3/DIR_8/DIR_4/DIR_F/DIR_8 25 in /var/lib/ceph/osd/ceph-91/current/20.83s4_head/DIR_3/DIR_8/DIR_4/DIR_F/DIR_9 19 in /var/lib/ceph/osd/ceph-91/current/20.83s4_head/DIR_3/DIR_8/DIR_4/DIR_F/DIR_A 22 in /var/lib/ceph/osd/ceph-91/current/20.83s4_head/DIR_3/DIR_8/DIR_4/DIR_F/DIR_B 28 in /var/lib/ceph/osd/ceph-91/current/20.83s4_head/DIR_3/DIR_8/DIR_4/DIR_F/DIR_C 20 in /var/lib/ceph/osd/ceph-91/current/20.83s4_head/DIR_3/DIR_8/DIR_4/DIR_F/DIR_D 19 in /var/lib/ceph/osd/ceph-91/current/20.83s4_head/DIR_3/DIR_8/DIR_4/DIR_F/DIR_E 28 in /var/lib/ceph/osd/ceph-91/current/20.83s4_head/DIR_3/DIR_8/DIR_4/DIR_F/DIR_F 338 in /var/lib/ceph/osd/ceph-91/current/20.83s4_head/DIR_3/DIR_8/DIR_8/DIR_0 344 in /var/lib/ceph/osd/ceph-91/current/20.83s4_head/DIR_3/DIR_8/DIR_8/DIR_1 320 in /var/lib/ceph/osd/ceph-91/current/20.83s4_head/DIR_3/DIR_8/DIR_8/DIR_2 50 in /var/lib/ceph/osd/ceph-91/current/20.83s4_head/DIR_3/DIR_8/DIR_8/DIR_3 21 in /var/lib/ceph/osd/ceph-91/current/20.83s4_head/DIR_3/DIR_8/DIR_8/DIR_3/DIR_0 26 in /var/lib/ceph/osd/ceph-91/current/20.83s4_head/DIR_3/DIR_8/DIR_8/DIR_3/DIR_1 20 in /var/lib/ceph/osd/ceph-91/current/20.83s4_head/DIR_3/DIR_8/DIR_8/DIR_3/DIR_2 13 in /var/lib/ceph/osd/ceph-91/current/20.83s4_head/DIR_3/DIR_8/DIR_8/DIR_3/DIR_3 28 in /var/lib/ceph/osd/ceph-91/current/20.83s4_head/DIR_3/DIR_8/DIR_8/DIR_3/DIR_4 19 in /var/lib/ceph/osd/ceph-91/current/20.83s4_head/DIR_3/DIR_8/DIR_8/DIR_3/DIR_5 20 in /var/lib/ceph/osd/ceph-91/current/20.83s4_head/DIR_3/DIR_8/DIR_8/DIR_3/DIR_6 17 in /var/lib/ceph/osd/ceph-91/current/20.83s4_head/DIR_3/DIR_8/DIR_8/DIR_3/DIR_7 21 in /var/lib/ceph/osd/ceph-91/current/20.83s4_head/DIR_3/DIR_8/DIR_8/DIR_3/DIR_8 22 in /var/lib/ceph/osd/ceph-91/current/20.83s4_head/DIR_3/DIR_8/DIR_8/DIR_3/DIR_9 23 in /var/lib/ceph/osd/ceph-91/current/20.83s4_head/DIR_3/DIR_8/DIR_8/DIR_3/DIR_B 23 in /var/lib/ceph/osd/ceph-91/current/20.83s4_head/DIR_3/DIR_8/DIR_8/DIR_3/DIR_E 26 in /var/lib/ceph/osd/ceph-91/current/20.83s4_head/DIR_3/DIR_8/DIR_8/DIR_3/DIR_D 17 in /var/lib/ceph/osd/ceph-91/current/20.83s4_head/DIR_3/DIR_8/DIR_8/DIR_3/DIR_F 325 in /var/lib/ceph/osd/ceph-91/current/20.83s4_head/DIR_3/DIR_8/DIR_8/DIR_4
...

Does ceph-objectstore-tool merge/split filestore structure ? I can try it tomorrow.

It only does the splitting part. You'd might want to set filestore_merge_threshold to -40 to not merge, OR provoke merging off peak hours with something like this :

while true ; do rados bench -p default.rgw.buckets.data 10 write -b 4K -t 16 ; sleep 60 ; done

Every of the 1024 PGs should rapidly see an object write and delete and then be merged.

Regards,

Frédéric.


Many thanks
Jakub
​

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to