Re: [ceph-users] ceph status: pg backfill_toofull, but all OSDs have enough space

Lars Täuber Thu, 22 Aug 2019 01:31:46 -0700

Hi there!

We also experience this behaviour of our cluster while it is moving pgs.


# ceph health detail
HEALTH_ERR 1 MDSs report slow metadata IOs; Reduced data availability: 2 pgs 
inactive; Degraded data redundancy (low space): 1 pg backfill_toofull
MDS_SLOW_METADATA_IO 1 MDSs report slow metadata IOs
    mdsmds1(mds.0): 1 slow metadata IOs are blocked > 30 secs, oldest blocked 
for 359 secs
PG_AVAILABILITY Reduced data availability: 2 pgs inactive
    pg 21.231 is stuck inactive for 878.224182, current state remapped, last 
acting [20,2147483647,13,2147483647,15,10]
    pg 21.240 is stuck inactive for 878.123932, current state remapped, last 
acting [26,17,21,20,2147483647,2147483647]
PG_DEGRADED_FULL Degraded data redundancy (low space): 1 pg backfill_toofull
    pg 21.376 is active+remapped+backfill_wait+backfill_toofull, acting 
[6,11,29,2,10,15]
# ceph pg map 21.376
osdmap e68016 pg 21.376 (21.376) -> up [6,5,23,21,10,11] acting 
[6,11,29,2,10,15]

# ceph osd dump | fgrep ratio
full_ratio 0.95
backfillfull_ratio 0.9
nearfull_ratio 0.85

This happens while the cluster is rebalancing the pgs after I manually mark a 
single osd out.
see here:
 Subject: [ceph-users] pg 21.1f9 is stuck inactive for 53316.902820,  current 
state remapped
 http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-August/036634.html


Mostly the cluster heals itself at least into state HEALTH_WARN:


# ceph health detail
HEALTH_WARN 1 MDSs report slow metadata IOs; Reduced data availability: 2 pgs 
inactive
MDS_SLOW_METADATA_IO 1 MDSs report slow metadata IOs
    mdsmds1(mds.0): 1 slow metadata IOs are blocked > 30 secs, oldest blocked 
for 1155 secs
PG_AVAILABILITY Reduced data availability: 2 pgs inactive
    pg 21.231 is stuck inactive for 1677.312219, current state remapped, last 
acting [20,2147483647,13,2147483647,15,10]
    pg 21.240 is stuck inactive for 1677.211969, current state remapped, last 
acting [26,17,21,20,2147483647,2147483647]



Cheers,
Lars


Wed, 21 Aug 2019 17:28:05 -0500
Reed Dier <[email protected]> ==> Vladimir Brik 
<[email protected]> :
> Just chiming in to say that I too had some issues with backfill_toofull PGs, 
> despite no OSD's being in a backfill_full state, albeit, there were some 
> nearfull OSDs.
> 
> I was able to get through it by reweighting down the OSD that was the target 
> reported by ceph pg dump | grep 'backfill_toofull'.
> 
> This was on 14.2.2.
> 
> Reed
> 
> > On Aug 21, 2019, at 2:50 PM, Vladimir Brik <[email protected]> 
> > wrote:
> > 
> > Hello
> > 
> > After increasing number of PGs in a pool, ceph status is reporting 
> > "Degraded data redundancy (low space): 1 pg backfill_toofull", but I don't 
> > understand why, because all OSDs seem to have enough space.
> > 
> > ceph health detail says:
> > pg 40.155 is active+remapped+backfill_toofull, acting [20,57,79,85]
> > 
> > $ ceph pg map 40.155
> > osdmap e3952 pg 40.155 (40.155) -> up [20,57,66,85] acting [20,57,79,85]
> > 
> > So I guess Ceph wants to move 40.155 from 66 to 79 (or other way around?). 
> > According to "osd df", OSD 66's utilization is 71.90%, OSD 79's utilization 
> > is 58.45%. The OSD with least free space in the cluster is 81.23% full, and 
> > it's not any of the ones above.
> > 
> > OSD backfillfull_ratio is 90% (is there a better way to determine this?):
> > $ ceph osd dump | grep ratio
> > full_ratio 0.95
> > backfillfull_ratio 0.9
> > nearfull_ratio 0.7
> > 
> > Does anybody know why a PG could be in the backfill_toofull state if no OSD 
> > is in the backfillfull state?
> > 
> > 
> > Vlad
> > _______________________________________________
> > ceph-users mailing list
> > [email protected]
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com  
> 


-- 
                            Informationstechnologie
Berlin-Brandenburgische Akademie der Wissenschaften
Jägerstraße 22-23                      10117 Berlin
Tel.: +49 30 20370-352           http://www.bbaw.de

smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph status: pg backfill_toofull, but all OSDs have enough space

Reply via email to