thanks for the response
> You really will want to spend more time reading documentation and this ML,
> as well as using google to (re-)search things.
I did do some reading on the error but cannot understand why they do
not clear even after so long.
> In your previous mail you already mentioned a 92% full OSD, that should
> combined with the various "full" warnings have impressed on you the need
> to address this issue.
> When your nodes all rebooted, did everything come back up?
One host with 5 osd were down nad came up later.
> And if so (as the 15 osds: 15 up, 15 in suggest), how much separated in
time?
about 7 hours
> And if so (as the 15 osds: 15 up, 15 in suggest), how much separated in
time? about 7 hours
True
> Bad, Ceph wants to place data onto these 2 PGs, but their OSDs are too
> full for that.
> And until something changes it will be stuck there.
> Your best bet is to add more OSDs, since you seem to be short on space
> anyway. Or delete unneeded data.
> Given your level of experience, I'd advice against playing with weights
> and the respective "full" configuration options.
I did reweights some osd but everything is back to normal. No config
changes on "Full" config
I deleted about 900G this morning and prepared 3 osd, should I add them now?
> Are these numbers and the recovery io below still changing, moving along?
original email:
> recovery 493335/3099981 objects degraded (15.914%)
> recovery 1377464/3099981 objects misplaced (44.435%)
current email:
recovery 389973/3096070 objects degraded (12.596%)
recovery 1258984/3096070 objects misplaced (40.664%)
> Just to confirm, that's all the 15 OSDs your cluster ever had?
yes
> Output from "ceph osd df" and "ceph osd tree" please.
ID WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS
3 0.90868 1.00000 930G 232G 698G 24.96 0.40 105
5 0.90868 1.00000 930G 139G 791G 14.99 0.24 139
6 0.90868 1.00000 930G 61830M 870G 6.49 0.10 138
0 0.90868 1.00000 930G 304G 625G 32.76 0.53 128
2 0.90868 1.00000 930G 24253M 906G 2.55 0.04 130
1 0.90868 1.00000 930G 793G 137G 85.22 1.37 162
4 0.90868 1.00000 930G 790G 140G 84.91 1.36 160
7 0.90868 1.00000 930G 803G 127G 86.34 1.39 144
10 0.90868 1.00000 930G 792G 138G 85.16 1.37 145
13 0.90868 1.00000 930G 811G 119G 87.17 1.40 163
15 0.90869 1.00000 930G 794G 136G 85.37 1.37 157
16 0.90869 1.00000 930G 757G 172G 81.45 1.31 159
17 0.90868 1.00000 930G 800G 129G 86.06 1.38 144
18 0.90869 1.00000 930G 786G 144G 84.47 1.36 166
19 0.90868 1.00000 930G 793G 137G 85.26 1.37 160
TOTAL 13958G 8683G 5274G 62.21
MIN/MAX VAR: 0.04/1.40 STDDEV: 33.10
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 13.63019 root default
-2 4.54338 host nodeB
3 0.90868 osd.3 up 1.00000 1.00000
5 0.90868 osd.5 up 1.00000 1.00000
6 0.90868 osd.6 up 1.00000 1.00000
0 0.90868 osd.0 up 1.00000 1.00000
2 0.90868 osd.2 up 1.00000 1.00000
-3 4.54338 host nodeC
1 0.90868 osd.1 up 1.00000 1.00000
4 0.90868 osd.4 up 1.00000 1.00000
7 0.90868 osd.7 up 1.00000 1.00000
10 0.90868 osd.10 up 1.00000 1.00000
13 0.90868 osd.13 up 1.00000 1.00000
-6 4.54343 host nodeD
15 0.90869 osd.15 up 1.00000 1.00000
16 0.90869 osd.16 up 1.00000 1.00000
17 0.90868 osd.17 up 1.00000 1.00000
18 0.90869 osd.18 up 1.00000 1.00000
19 0.90868 osd.19 up 1.00000 1.00000
On Thu, Sep 1, 2016 at 10:56 AM, Christian Balzer <[email protected]> wrote:
>
>
> Hello,
>
> On Thu, 1 Sep 2016 10:18:39 +0200 Ishmael Tsoaela wrote:
>
> > Hi All,
> >
> > Can someone please decipher this errors for me, after all nodes rebooted in
> > my cluster on Monday. the warning has not gone.
> >
> You really will want to spend more time reading documentation and this ML,
> as well as using google to (re-)search things.
> Like searching for "backfill_toofull", "near full", etc.
>
>
> > Will the warning ever clear?
> >
> Unlikely.
>
> In your previous mail you already mentioned a 92% full OSD, that should
> combined with the various "full" warnings have impressed on you the need
> to address this issue.
>
> When your nodes all rebooted, did everything come back up?
> And if so (as the 15 osds: 15 up, 15 in suggest), how much separated in
> time?
> My guess is that some nodes/OSDs where restarted a lot later than others.
>
> See inline:
> >
> > cluster df3f96d8-3889-4baa-8b27-cc2839141425
> > health HEALTH_WARN
> > 2 pgs backfill_toofull
> Bad, Ceph wants to place data onto these 2 PGs, but their OSDs are too
> full for that.
> And until something changes it will be stuck there.
>
> Your best bet is to add more OSDs, since you seem to be short on space
> anyway. Or delete unneeded data.
> Given your level of experience, I'd advice against playing with weights
> and the respective "full" configuration options.
>
> > 532 pgs backfill_wait
> > 3 pgs backfilling
> > 330 pgs degraded
> > 537 pgs stuck unclean
> > 330 pgs undersized
> > recovery 493335/3099981 objects degraded (15.914%)
> > recovery 1377464/3099981 objects misplaced (44.435%)
> Are these numbers and the recovery io below still changing, moving along?
>
> > 8 near full osd(s)
> 8 out of 15, definitely needs more OSD.
> Output from "ceph osd df" and "ceph osd tree" please.
>
> > monmap e7: 3 mons at {Monitors}
> > election epoch 118, quorum 0,1,2 nodeB,nodeC,nodeD
> > osdmap e3922: 15 osds: 15 up, 15 in; 537 remapped pgs
>
> Just to confirm, that's all the 15 OSDs your cluster ever had?
>
> Christian
>
> > flags sortbitwise
> > pgmap v2431741: 640 pgs, 3 pools, 3338 GB data, 864 kobjects
> > 8242 GB used, 5715 GB / 13958 GB avail
> > 493335/3099981 objects degraded (15.914%)
> > 1377464/3099981 objects misplaced (44.435%)
> > 327 active+undersized+degraded+remapped+wait_backfill
> > 205 active+remapped+wait_backfill
> > 103 active+clean
> > 3 active+undersized+degraded+remapped+backfilling
> > 2 active+remapped+backfill_toofull
> > recovery io 367 MB/s, 96 objects/s
> > client io 5699 B/s rd, 23749 B/s wr, 2 op/s rd, 12 op/s wr
>
>
> --
> Christian Balzer Network/Systems Engineer
> [email protected] Global OnLine Japan/Rakuten Communications
> http://www.gol.com/
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com