[ceph-users] active+remapped+backfilling keeps going .. and going

Kyriazis, George Thu, 23 Apr 2020 13:52:10 -0700

Hello,

I have a Proxmox ceph cluster with 5 nodes and 3 OSDs each (total 15 OSDs), on 
a 10G network.


The cluster started small, and I’ve progressively added OSDs over time.  
Problem is…. The cluster never rebalances completely.  There is always progress 
on backfilling, but PGs that used to be in active+clean state jump back into 
the active+remapped+backfilling (or active+remapped+backfill_wait) state, to be 
moved to different OSDs.

Initially I had a 1G network (recently upgraded to 10G), and I was holding on 
the backfill settings (osd_max_backfills and osd_recovery_sleep_hdd).  I just 
recently (last few weeks) upgraded to 10G, with osd_max_backfills = 50 and 
osd_recovery_sleep_hdd = 0 (only HDDs, no SSDs).  Cluster has been backfilling 
for months now with no end in sight.

Is this normal behavior?  Is there any setting that I can look at that till 
give me an idea as to why PGs are jumping back into remapped from clean?

Below is output of “ceph osd tree” and “ceph osd df”:

# ceph osd tree
ID  CLASS WEIGHT    TYPE NAME           STATUS REWEIGHT PRI-AFF
 -1       203.72472 root default
 -9        40.01666     host vis-hsw-01
  3   hdd  10.91309         osd.3           up  1.00000 1.00000
  6   hdd  14.55179         osd.6           up  1.00000 1.00000
 10   hdd  14.55179         osd.10          up  1.00000 1.00000
-13        40.01666     host vis-hsw-02
  0   hdd  10.91309         osd.0           up  1.00000 1.00000
  7   hdd  14.55179         osd.7           up  1.00000 1.00000
 11   hdd  14.55179         osd.11          up  1.00000 1.00000
-11        40.01666     host vis-hsw-03
  4   hdd  10.91309         osd.4           up  1.00000 1.00000
  8   hdd  14.55179         osd.8           up  1.00000 1.00000
 12   hdd  14.55179         osd.12          up  1.00000 1.00000
 -3        40.01666     host vis-hsw-04
  5   hdd  10.91309         osd.5           up  1.00000 1.00000
  9   hdd  14.55179         osd.9           up  1.00000 1.00000
 13   hdd  14.55179         osd.13          up  1.00000 1.00000
-15        43.65807     host vis-hsw-05
  1   hdd  14.55269         osd.1           up  1.00000 1.00000
  2   hdd  14.55269         osd.2           up  1.00000 1.00000
 14   hdd  14.55269         osd.14          up  1.00000 1.00000
 -5               0     host vis-ivb-07
 -7               0     host vis-ivb-10
#

# ceph osd df
ID CLASS WEIGHT   REWEIGHT SIZE    RAW USE DATA    OMAP    META    AVAIL   %USE 
 VAR  PGS STATUS
 3   hdd 10.91309  1.00000  11 TiB 8.2 TiB 8.2 TiB 552 MiB  25 GiB 2.7 TiB 
75.08 1.19 131     up
 6   hdd 14.55179  1.00000  15 TiB 9.1 TiB 9.1 TiB 1.2 GiB  30 GiB 5.5 TiB 
62.47 0.99 148     up
10   hdd 14.55179  1.00000  15 TiB 8.1 TiB 8.1 TiB 1.5 GiB  20 GiB 6.4 TiB 
55.98 0.89 142     up
 0   hdd 10.91309  1.00000  11 TiB 7.5 TiB 7.4 TiB 504 MiB  24 GiB 3.5 TiB 
68.34 1.09 120     up
 7   hdd 14.55179  1.00000  15 TiB 8.7 TiB 8.7 TiB 1.0 GiB  31 GiB 5.8 TiB 
60.07 0.95 144     up
11   hdd 14.55179  1.00000  15 TiB 9.4 TiB 9.3 TiB 819 MiB  20 GiB 5.2 TiB 
64.31 1.02 147     up
 4   hdd 10.91309  1.00000  11 TiB 7.0 TiB 7.0 TiB 284 MiB  25 GiB 3.9 TiB 
64.35 1.02 112     up
 8   hdd 14.55179  1.00000  15 TiB 9.3 TiB 9.2 TiB 1.8 GiB  29 GiB 5.3 TiB 
63.65 1.01 157     up
12   hdd 14.55179  1.00000  15 TiB 8.6 TiB 8.6 TiB 623 MiB  19 GiB 5.9 TiB 
59.14 0.94 136     up
 5   hdd 10.91309  1.00000  11 TiB 8.6 TiB 8.6 TiB 542 MiB  29 GiB 2.3 TiB 
79.01 1.26 134     up
 9   hdd 14.55179  1.00000  15 TiB 8.2 TiB 8.2 TiB 707 MiB  27 GiB 6.3 TiB 
56.56 0.90 138     up
13   hdd 14.55179  1.00000  15 TiB 8.7 TiB 8.7 TiB 741 MiB  18 GiB 5.8 TiB 
59.85 0.95 134     up
 1   hdd 14.55269  1.00000  15 TiB 9.8 TiB 9.8 TiB 1.3 GiB  20 GiB 4.8 TiB 
67.18 1.07 158     up
 2   hdd 14.55269  1.00000  15 TiB 8.7 TiB 8.7 TiB 936 MiB  18 GiB 5.8 TiB 
60.04 0.95 148     up
14   hdd 14.55269  1.00000  15 TiB 8.3 TiB 8.3 TiB 673 MiB  18 GiB 6.3 TiB 
56.97 0.90 131     up
                     TOTAL 204 TiB 128 TiB 128 TiB  13 GiB 350 GiB  75 TiB 62.95
MIN/MAX VAR: 0.89/1.26  STDDEV: 6.44
#


Thank you!

George

_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ceph-users] active+remapped+backfilling keeps going .. and going

Reply via email to