Hi,

Good point ! Changing this value, *and* restarting ceph-mgr fix this
issue. Now we have to find a way to reduce PG account.

Thanks Paul !

Olivier

Le mardi 05 juin 2018 à 10:39 +0200, Paul Emmerich a écrit :
> Hi,
> 
> looks like you are running into the PG overdose protection of
> Luminous (you got > 200 PGs per OSD): try to increase
> mon_max_pg_per_osd on the monitors to 300 or so to temporarily
> resolve this.
> 
> Paul
> 
> 2018-06-05 9:40 GMT+02:00 Olivier Bonvalet <ceph.l...@daevel.fr>:
> > Some more informations : the cluster was just upgraded from Jewel
> > to
> > Luminous.
> > 
> > # ceph pg dump | egrep '(stale|creating)'
> > dumped all
> > 15.32     10947                  0        0         0       0 
> > 45870301184  3067     3067                               
> > stale+active+clean 2018-06-04 09:20:42.594317   387644'251008   
> >  437722:754803                    [48,31,45]         48           
> >         [48,31,45]             48   213014'224196 2018-04-22
> > 02:01:09.148152   200181'219150 2018-04-14 14:40:13.116285         
> >    0 
> > 19.77      4131                  0        0         0       0 
> > 17326669824  3076     3076                                       
> > stale+down 2018-06-05 07:28:33.968860    394478'58307   
> >  438699:736881                  [NONE,20,76]         20           
> >       [NONE,20,76]             20    273736'49495 2018-05-17
> > 01:05:35.523735    273736'49495 2018-05-17 01:05:35.523735         
> >    0 
> > 13.76     10730                  0        0         0       0 
> > 44127133696  3011     3011                                       
> > stale+down 2018-06-05 07:30:27.578512   397231'457143   
> > 438813:4600135                  [NONE,21,76]         21           
> >       [NONE,21,76]             21   286462'438402 2018-05-20
> > 18:06:12.443141   286462'438402 2018-05-20 18:06:12.443141         
> >    0 
> > 
> > 
> > 
> > 
> > Le mardi 05 juin 2018 à 09:25 +0200, Olivier Bonvalet a écrit :
> > > Hi,
> > > 
> > > I have a cluster in "stale" state : a lots of RBD are blocked
> > since
> > > ~10
> > > hours. In the status I see PG in stale or down state, but thoses
> > PG
> > > doesn't seem to exists anymore :
> > > 
> > > root! stor00-sbg:~# ceph health detail | egrep '(stale|down)'
> > > HEALTH_ERR noout,noscrub,nodeep-scrub flag(s) set; 1 nearfull
> > osd(s);
> > > 16 pool(s) nearfull; 4645278/103969515 objects misplaced
> > (4.468%);
> > > Reduced data availability: 643 pgs inactive, 12 pgs down, 2 pgs
> > > peering, 3 pgs stale; Degraded data redundancy: 2723173/103969515
> > > objects degraded (2.619%), 387 pgs degraded, 297 pgs undersized;
> > 229
> > > slow requests are blocked > 32 sec; 4074 stuck requests are
> > blocked >
> > > 4096 sec; too many PGs per OSD (202 > max 200); mons hyp01-
> > sbg,hyp02-
> > > sbg,hyp03-sbg are using a lot of disk space
> > > PG_AVAILABILITY Reduced data availability: 643 pgs inactive, 12
> > pgs
> > > down, 2 pgs peering, 3 pgs stale
> > >     pg 31.8b is down, acting [2147483647,16,36]
> > >     pg 31.8e is down, acting [2147483647,29,19]
> > >     pg 46.b8 is down, acting [2147483647,2147483647,13,17,47,28]
> > > 
> > > root! stor00-sbg:~# ceph pg 31.8b query
> > > Error ENOENT: i don't have pgid 31.8b
> > > 
> > > root! stor00-sbg:~# ceph pg 31.8e query
> > > Error ENOENT: i don't have pgid 31.8e
> > > 
> > > root! stor00-sbg:~# ceph pg 46.b8 query
> > > Error ENOENT: i don't have pgid 46.b8
> > > 
> > > 
> > > We just loose an HDD, and mark the corresponding OSD as "lost".
> > > 
> > > Any idea of what should I do ?
> > > 
> > > Thanks,
> > > 
> > > Olivier
> > > _______________________________________________
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > 
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> 
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to