An OSD that is down does not recover or backfill. Faster recovery or
backfill will not resolve down OSDs


Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90


On Mon, Dec 9, 2019 at 1:42 PM Thomas Schneider <74cmo...@gmail.com> wrote:

> Hi,
>
> I think I can speed-up the recovery / backfill.
>
> What is the recommended setting for
> osd_max_backfills
> osd_recovery_max_active
> ?
>
> THX
>
> Am 09.12.2019 um 13:36 schrieb Paul Emmerich:
> > This message is expected.
> >
> > But your current situation is a great example of why having a separate
> > cluster network is a bad idea in most situations.
> > First thing I'd do in this scenario is to get rid of the cluster
> > network and see if that helps
> >
> >
> > Paul
> >
> > --
> > Paul Emmerich
> >
> > Looking for help with your Ceph cluster? Contact us at https://croit.io
> >
> > croit GmbH
> > Freseniusstr. 31h
> > 81247 München
> > www.croit.io <http://www.croit.io>
> > Tel: +49 89 1896585 90
> >
> >
> > On Mon, Dec 9, 2019 at 11:22 AM Thomas Schneider <74cmo...@gmail.com
> > <mailto:74cmo...@gmail.com>> wrote:
> >
> >     Hi,
> >     I had a failure on 2 of 7 OSD nodes.
> >     This caused a server reboot and unfortunately the cluster network
> >     failed
> >     to come up.
> >
> >     This resulted in many OSD down situation.
> >
> >     I decided to stop all services (OSD, MGR, MON) and to start them
> >     sequentially.
> >
> >     Now I have multiple OSD marked as down although the service is
> >     running.
> >     None of these down OSDS is connected to the 2 nodes with failure.
> >
> >     In the OSD logs I can see multiple entries like this:
> >     2019-12-09 11:13:10.378 7f9a372fb700  1 osd.374 pg_epoch: 493189
> >     pg[11.1992( v 457986'92619 (303558'88266,457986'92619]
> >     local-lis/les=466724/466725 n=4107 ec=8346/8346 lis/c 466724/466724
> >     les/c/f 466725/466725/176266 468956/493184/468423) [203,412] r=-1
> >     lpr=493184 pi=[466724,493184)/1 crt=457986'92619 lcod 0'0 unknown
> >     NOTIFY
> >     mbc={}] state<Start>: transitioning to Stray
> >
> >     I tried to restart the impacted OSD w/o success, means the
> >     relevant OSD
> >     is still marked as down.
> >
> >     Is there a procedure to overcome this issue, means getting all OSD
> up?
> >
> >     THX
> >     _______________________________________________
> >     ceph-users mailing list -- ceph-users@ceph.io
> >     <mailto:ceph-users@ceph.io>
> >     To unsubscribe send an email to ceph-users-le...@ceph.io
> >     <mailto:ceph-users-le...@ceph.io>
> >
>
>
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to