Re: [ceph-users] cluster failing to recover

Sean Redmond Tue, 05 Jul 2016 08:46:50 -0700

Hi,

What happened to the missing 2 OSD's?


53 osds: 51 up, 51 in

Thanks

On Tue, Jul 5, 2016 at 4:04 PM, Matyas Koszik <kos...@atw.hu> wrote:

>
> Should you be interested, the solution to this was
> ceph pg $pg mark_unfound_lost delete
> for all pgs that had unfound objects, now the cluster is back in a healthy
> state.
>
> I think this is very counter-intuitive (why should totally unrelated pgs
> be affected by this?!) but at least the solution was simple.
>
> Matyas
>
> On Mon, 4 Jul 2016, Oliver Dzombic wrote:
>
> > Hi,
> >
> > did you already do something ( replacing drives or changing something ) ?
> >
> > You have 11 scrub errors, and ~ 11x inconsistent pg's
> >
> > The inconsistent pg's, for example:
> >
> > pg 4.3a7 is stuck unclean for 629.766502, current state
> > active+recovery_wait+degraded+inconsistent, last acting [10,21]
> >
> > are not on the down osd's 1 and 22
> >
> > neighter of them.
> >
> > So the should not be missing. But they are.
> >
> > Anyway, i think the next step would be to start a pg_repair command and
> > see where the road goes.
> >
> > --
> > Mit freundlichen Gruessen / Best regards
> >
> > Oliver Dzombic
> > IP-Interactive
> >
> > mailto:i...@ip-interactive.de
> >
> > Anschrift:
> >
> > IP Interactive UG ( haftungsbeschraenkt )
> > Zum Sonnenberg 1-3
> > 63571 Gelnhausen
> >
> > HRB 93402 beim Amtsgericht Hanau
> > Geschäftsführung: Oliver Dzombic
> >
> > Steuer Nr.: 35 236 3622 1
> > UST ID: DE274086107
> >
> >
> > Am 03.07.2016 um 23:59 schrieb Matyas Koszik:
> > >
> > > Hi,
> > >
> > > I've continued restarting osds in the meantime, and it got somewhat
> > > better, but still very far from optimal.
> > >
> > > Here're the details you requested:
> > >
> > > http://pastebin.com/Vqgadz24
> > >
> > > http://pastebin.com/vCL6BRvC
> > >
> > > Matyas
> > >
> > >
> > > On Sun, 3 Jul 2016, Oliver Dzombic wrote:
> > >
> > >> Hi,
> > >>
> > >> please provide:
> > >>
> > >> ceph health detail
> > >>
> > >> ceph osd tree
> > >>
> > >> --
> > >> Mit freundlichen Gruessen / Best regards
> > >>
> > >> Oliver Dzombic
> > >> IP-Interactive
> > >>
> > >> mailto:i...@ip-interactive.de
> > >>
> > >> Anschrift:
> > >>
> > >> IP Interactive UG ( haftungsbeschraenkt )
> > >> Zum Sonnenberg 1-3
> > >> 63571 Gelnhausen
> > >>
> > >> HRB 93402 beim Amtsgericht Hanau
> > >> Geschäftsführung: Oliver Dzombic
> > >>
> > >> Steuer Nr.: 35 236 3622 1
> > >> UST ID: DE274086107
> > >>
> > >>
> > >> Am 03.07.2016 um 21:36 schrieb Matyas Koszik:
> > >>>
> > >>> Hi,
> > >>>
> > >>> I recently upgraded to jewel (10.2.2) and now I'm confronted with a
> rather
> > >>> strange behavior: recovey does not progress in the way it should. If
> I
> > >>> restart the osds on a host, it'll get a bit better (or worse), like
> this:
> > >>>
> > >>> 50 pgs undersized
> > >>> recovery 43775/7057285 objects degraded (0.620%)
> > >>> recovery 87980/7057285 objects misplaced (1.247%)
> > >>>
> > >>> [restart osds on node1]
> > >>>
> > >>> 44 pgs undersized
> > >>> recovery 39623/7061519 objects degraded (0.561%)
> > >>> recovery 92142/7061519 objects misplaced (1.305%)
> > >>>
> > >>> [restart osds on node1]
> > >>>
> > >>> 43 pgs undersized
> > >>> 1116 requests are blocked > 32 sec
> > >>> recovery 38181/7061529 objects degraded (0.541%)
> > >>> recovery 90617/7061529 objects misplaced (1.283%)
> > >>>
> > >>> ...
> > >>>
> > >>> The current state is this:
> > >>>
> > >>>  osdmap e38804: 53 osds: 51 up, 51 in; 66 remapped pgs
> > >>>   pgmap v14797137: 4388 pgs, 8 pools, 13626 GB data, 3434 kobjects
> > >>>         27474 GB used, 22856 GB / 50330 GB avail
> > >>>         38172/7061565 objects degraded (0.541%)
> > >>>         90617/7061565 objects misplaced (1.283%)
> > >>>         8/3517300 unfound (0.000%)
> > >>>             4202 active+clean
> > >>>              109 active+recovery_wait+degraded
> > >>>               38 active+undersized+degraded+remapped+wait_backfill
> > >>>               15 active+remapped+wait_backfill
> > >>>               11 active+clean+inconsistent
> > >>>                8 active+recovery_wait+degraded+remapped
> > >>>                3 active+recovering+undersized+degraded+remapped
> > >>>                2 active+recovery_wait+undersized+degraded+remapped
> > >>>
> > >>>
> > >>> All the pools have size=2 min_size=1.
> > >>>
> > >>> (All the unfound blocks are on undersized pgs, and I cannot seem to
> be
> > >>> able to fix them without having replicas (?). They exist, but are
> > >>> outdated, from an earlier problem.)
> > >>>
> > >>>
> > >>>
> > >>> Matyas
> > >>>
> > >>>
> > >>> _______________________________________________
> > >>> ceph-users mailing list
> > >>> ceph-users@lists.ceph.com
> > >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >>>
> > >> _______________________________________________
> > >> ceph-users mailing list
> > >> ceph-users@lists.ceph.com
> > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >>
> > >
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] cluster failing to recover

Reply via email to