On 11/22/18 6:12 PM, Marco Gaiarin wrote:
Mandi! Paweł Sadowsk
   In chel di` si favelave...

 From your osd tree it looks like you used 'ceph osd reweight'.
Yes, and i supposed also to do the right things!

Now, i've tried to lower the to-dimissi OSD, using:
        ceph osd reweight 2 0.95

leading to an osd map tree like:

  root@blackpanther:~# ceph osd tree
  ID WEIGHT   TYPE NAME               UP/DOWN REWEIGHT PRIMARY-AFFINITY
  -1 21.83984 root default
  -2  5.45996     host capitanamerica
   0  1.81999         osd.0                up  1.00000          1.00000
   1  1.81999         osd.1                up  1.00000          1.00000
  10  0.90999         osd.10               up  1.00000          1.00000
  11  0.90999         osd.11               up  1.00000          1.00000
  -3  5.45996     host vedovanera
   2  1.81999         osd.2                up  0.95000          1.00000
   3  1.81999         osd.3                up  1.00000          1.00000
   4  0.90999         osd.4                up  1.00000          1.00000
   5  0.90999         osd.5                up  1.00000          1.00000
  -4  5.45996     host deadpool
   6  1.81999         osd.6                up  1.00000          1.00000
   7  1.81999         osd.7                up  1.00000          1.00000
   8  0.90999         osd.8                up  1.00000          1.00000
   9  0.90999         osd.9                up  1.00000          1.00000
  -5  5.45996     host blackpanther
  12  1.81999         osd.12               up  0.04999          1.00000
  13  1.81999         osd.13               up  0.04999          1.00000
  14  0.90999         osd.14               up  0.04999          1.00000
  15  0.90999         osd.15               up  0.04999          1.00000

and, after rebalancing, to:

  root@blackpanther:~# ceph -s
     cluster 8794c124-c2ec-4e81-8631-742992159bd6
      health HEALTH_WARN
             6 pgs stuck unclean
             recovery 4/2550363 objects degraded (0.000%)
             recovery 11282/2550363 objects misplaced (0.442%)
      monmap e6: 6 mons at 
{0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0,4=10.27.251.9:6789/0,blackpanther=10.27.251.2:6789/0}
             election epoch 2750, quorum 0,1,2,3,4,5 blackpanther,0,1,4,2,3
      osdmap e7300: 16 osds: 16 up, 16 in; 6 remapped pgs
       pgmap v54737590: 768 pgs, 3 pools, 3299 GB data, 830 kobjects
             9870 GB used, 12474 GB / 22344 GB avail
             4/2550363 objects degraded (0.000%)
             11282/2550363 objects misplaced (0.442%)
                  761 active+clean
                    6 active+remapped
                    1 active+clean+scrubbing
   client io 13476 B/s rd, 654 kB/s wr, 95 op/s

Why pgs that are in state 'stuck unclean'?

This is most probably due to big difference in weights between your hosts (the new one has 20x lower weight than the old ones) which in combination with straw algorithm is a 'known' issue. You could try to increase *choose_total_tries* in your crush map from 50 to some bigger number. The best IMO would be to use straw2 (which will cause some rebalance) and then use 'ceph osd crush reweight' (instead of 'ceph osd reweight') with small steps to slowly rebalance data onto new OSDs.

--
PS

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to