This message is expected. But your current situation is a great example of why having a separate cluster network is a bad idea in most situations. First thing I'd do in this scenario is to get rid of the cluster network and see if that helps
Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 On Mon, Dec 9, 2019 at 11:22 AM Thomas Schneider <74cmo...@gmail.com> wrote: > Hi, > I had a failure on 2 of 7 OSD nodes. > This caused a server reboot and unfortunately the cluster network failed > to come up. > > This resulted in many OSD down situation. > > I decided to stop all services (OSD, MGR, MON) and to start them > sequentially. > > Now I have multiple OSD marked as down although the service is running. > None of these down OSDS is connected to the 2 nodes with failure. > > In the OSD logs I can see multiple entries like this: > 2019-12-09 11:13:10.378 7f9a372fb700 1 osd.374 pg_epoch: 493189 > pg[11.1992( v 457986'92619 (303558'88266,457986'92619] > local-lis/les=466724/466725 n=4107 ec=8346/8346 lis/c 466724/466724 > les/c/f 466725/466725/176266 468956/493184/468423) [203,412] r=-1 > lpr=493184 pi=[466724,493184)/1 crt=457986'92619 lcod 0'0 unknown NOTIFY > mbc={}] state<Start>: transitioning to Stray > > I tried to restart the impacted OSD w/o success, means the relevant OSD > is still marked as down. > > Is there a procedure to overcome this issue, means getting all OSD up? > > THX > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io >
_______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io