Hello, I'd like to ask few rebalancing and related questions. On one of my cluster, I got nearfull warning for one of OSDs.
Apart from that, the cluster health was perfectly OK,
all PGs active+clean.
Therefore I used rebalance-by-utilization which changed weights a
bit causing about 30% of data to be misplaced. After that, recovery
started, but it didn't got the cluster to clean state - some pgs
ended up in remapped state and even worse, some of them are left
undersized.
Even though I set weights to values before rebalance, it didn't help.
I'd like to ask more experienced users:
1) when I have cluster with evenly distributed OSDs and weights, it
happens that one of OSD suddenly gets much more filled then the others?
2) why rebalancing weights leads to undersized pgs? Is't this a bug
leading to unnecessary risk of data loss?
3) why changing weights by only a little value leads to such big data
transfers? I changed weight only for one OSD (out of 15) and by only
little value, and it caused about 30% misplaced groups.. is this OK?
4) after some experiments, I also got few pgs stuck in stale+active+clean
or creating state.. how to get rid of those?
5) last but not least, how can I help my cluster getting back to clean
state?
here's df tree:
[root@remrprv1c ceph]# ceph osd df tree
ID WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR TYPE NAME
-8 13.54486 - 13534G 8018G 5515G 59.24 1.63 root ssd
-2 4.51495 - 4511G 2869G 1641G 63.61 1.75 host remrprv1a-ssd
0 0.85999 0.38860 859G 594G 264G 69.22 1.90 osd.0
1 0.85999 0.33694 859G 557G 301G 64.89 1.78 osd.1
2 0.92999 0.44678 929G 617G 312G 66.43 1.82 osd.2
3 0.92999 0.32753 929G 580G 348G 62.46 1.71 osd.3
4 0.93500 0.31308 934G 519G 414G 55.60 1.53 osd.4
-3 4.51495 - 4511G 2595G 1915G 57.54 1.58 host remrprv1b-ssd
5 0.85999 0.31793 859G 456G 402G 53.16 1.46 osd.5
6 0.85999 0.40715 859G 502G 356G 58.47 1.60 osd.6
7 0.92999 0.38741 929G 500G 428G 53.87 1.48 osd.7
8 0.92999 0.38803 929G 607G 322G 65.30 1.79 osd.8
9 0.93500 0.36951 934G 529G 405G 56.64 1.55 osd.9
-4 4.51495 - 4511G 2552G 1958G 56.59 1.55 host remrprv1c-ssd
10 0.85999 0.34116 859G 456G 402G 53.11 1.46 osd.10
11 0.85999 0.38770 859G 488G 370G 56.88 1.56 osd.11
12 0.92999 0.41499 929G 556G 372G 59.90 1.64 osd.12
13 0.92999 0.35764 929G 534G 394G 57.53 1.58 osd.13
14 0.93500 0.38669 934G 516G 417G 55.29 1.52 osd.14
-1 21.59995 - 22004G 4929G 17074G 22.40 0.61 root sata
-7 7.19998 - 7334G 1644G 5690G 22.42 0.62 host remrprv1c-sata
19 3.59999 1.00000 3667G 819G 2848G 22.33 0.61 osd.19
20 3.59999 1.00000 3667G 825G 2841G 22.51 0.62 osd.20
-6 7.19998 - 7334G 1642G 5691G 22.40 0.61 host remrprv1b-sata
17 3.59999 1.00000 3667G 806G 2860G 21.99 0.60 osd.17
18 3.59999 1.00000 3667G 836G 2831G 22.80 0.63 osd.18
-5 7.19998 - 7334G 1642G 5692G 22.39 0.61 host remrprv1a-sata
15 3.59999 1.00000 3667G 853G 2813G 23.28 0.64 osd.15
16 3.59999 1.00000 3667G 788G 2879G 21.49 0.59 osd.16
TOTAL 35538G 12948G 22590G 36.43
MIN/MAX VAR: 0.59/1.90 STDDEV: 19.22
here's ceph -s:
[root@remrprv1c ceph]# ceph -s
cluster ff21618e-5aea-4cfe-83b6-a0d2d5b4052a
health HEALTH_WARN
3 pgs degraded
2 pgs stale
3 pgs stuck degraded
1 pgs stuck inactive
2 pgs stuck stale
242 pgs stuck unclean
3 pgs stuck undersized
3 pgs undersized
recovery 75/3374541 objects degraded (0.002%)
recovery 186194/3374541 objects misplaced (5.518%)
mds0: Behind on trimming (155/30)
monmap e3: 3 mons at
{remrprv1a=10.0.0.1:6789/0,remrprv1b=10.0.0.2:6789/0,remrprv1c=10.0.0.3:6789/0}
election epoch 522, quorum 0,1,2 remrprv1a,remrprv1b,remrprv1c
mdsmap e347: 1/1/1 up {0=remrprv1a=up:active}, 2 up:standby
osdmap e4423: 21 osds: 21 up, 21 in; 238 remapped pgs
pgmap v18686541: 1856 pgs, 7 pools, 4224 GB data, 1103 kobjects
12948 GB used, 22590 GB / 35538 GB avail
75/3374541 objects degraded (0.002%)
186194/3374541 objects misplaced (5.518%)
1612 active+clean
238 active+remapped
3 active+undersized+degraded
2 stale+active+clean
1 creating
client io 14830 B/s rd, 269 kB/s wr, 94 op/s
I'd be very gratefull for any help with those..
with best regards
nik
--
-------------------------------------
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava
tel.: +420 591 166 214
fax: +420 596 621 273
mobil: +420 777 093 799
www.linuxbox.cz
mobil servis: +420 737 238 656
email servis: [email protected]
-------------------------------------
pgp1GDSul_Wqi.pgp
Description: PGP signature
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
