Hi,

could this be related to this other Mimic upgrade thread [1]? Your failing MONs sound a bit like the problem described there, eventually the user reported recovery success. You could try the described steps:

 - disable cephx auth with 'auth_cluster_required = none'
 - set the mon_osd_cache_size = 200000 (default 10)
 - Setting 'osd_heartbeat_interval = 30'
 - setting 'mon_lease = 75'
- increase the rocksdb_cache_size and leveldb_cache_size on the mons to be big enough to cache the entire db

I just copied the mentioned steps, so please read the thread before applying anything.

Regards,
Eugen

[1] http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-September/030018.html


Zitat von by morphin <morphinwith...@gmail.com>:

After I tried too many things with so many helps on IRC. My pool
health is still in ERROR and I think I can't recover from this.
https://paste.ubuntu.com/p/HbsFnfkYDT/
At the end 2 of 3 mons crashed and started at same time and the pool
is offlined. Recovery takes more than 12hours and it is way too slow.
Somehow recovery seems to be not working.

If I can reach my data I will re-create the pool easily.
If I run ceph-object-tool script to regenerate mon store.db can I
acccess the RBD pool again?
by morphin <morphinwith...@gmail.com>, 25 Eyl 2018 Sal, 20:03
tarihinde şunu yazdı:

Hi,

Cluster is still down :(

Up to not we have managed to compensate the OSDs. 118s of 160 OSD are
stable and cluster is still in the progress of settling. Thanks for
the guy Be-El in the ceph IRC channel. Be-El helped a lot to make
flapping OSDs stable.

What we learned up now is that this is the cause of unsudden death of
2 monitor servers of 3. And when they come back if they do not start
one by one (each after joining cluster) this can happen. Cluster can
be unhealty and it can take countless hour to come back.

Right now here is our status:
ceph -s : https://paste.ubuntu.com/p/6DbgqnGS7t/
health detail: https://paste.ubuntu.com/p/w4gccnqZjR/

Since OSDs disks are NL-SAS it can take up to 24 hours for an online
cluster. What is most it has been said that we could be extremely luck
if all the data is rescued.

Most unhappily our strategy is just to sit and wait :(. As soon as the
peering and activating count drops to 300-500 pgs we will restart the
stopped OSDs one by one. For each OSD and we will wait the cluster to
settle down. The amount of data stored is OSD is 33TB. Our most
concern is to export our rbd pool data outside to a backup space. Then
we will start again with clean one.

I hope to justify our analysis with an expert. Any help or advise
would be greatly appreciated.
by morphin <morphinwith...@gmail.com>, 25 Eyl 2018 Sal, 15:08
tarihinde şunu yazdı:
>
> After reducing the recovery parameter values did not change much.
> There are a lot of OSD still marked down.
>
> I don't know what I need to do after this point.
>
> [osd]
> osd recovery op priority = 63
> osd client op priority = 1
> osd recovery max active = 1
> osd max scrubs = 1
>
>
> ceph -s
>   cluster:
>     id:     89569e73-eb89-41a4-9fc9-d2a5ec5f4106
>     health: HEALTH_ERR
>             42 osds down
>             1 host (6 osds) down
>             61/8948582 objects unfound (0.001%)
>             Reduced data availability: 3837 pgs inactive, 1822 pgs
> down, 1900 pgs peering, 6 pgs stale
>             Possible data damage: 18 pgs recovery_unfound
>             Degraded data redundancy: 457246/17897164 objects degraded
> (2.555%), 213 pgs degraded, 209 pgs undersized
>             2554 slow requests are blocked > 32 sec
>             3273 slow ops, oldest one blocked for 1453 sec, daemons
> [osd.0,osd.1,osd.10,osd.100,osd.101,osd.102,osd.103,osd.104,osd.105,osd.106]...
> have slow ops.
>
>   services:
>     mon: 3 daemons, quorum SRV-SEKUARK3,SRV-SBKUARK2,SRV-SBKUARK3
>     mgr: SRV-SBKUARK2(active), standbys: SRV-SEKUARK2, SRV-SEKUARK3,
> SRV-SEKUARK4
>     osd: 168 osds: 118 up, 160 in
>
>   data:
>     pools:   1 pools, 4096 pgs
>     objects: 8.95 M objects, 17 TiB
>     usage:   33 TiB used, 553 TiB / 586 TiB avail
>     pgs:     93.677% pgs not active
>              457246/17897164 objects degraded (2.555%)
>              61/8948582 objects unfound (0.001%)
>              1676 down
>              1372 peering
>              528  stale+peering
>              164  active+undersized+degraded
>              145  stale+down
>              73   activating
>              40   active+clean
>              29   stale+activating
>              17   active+recovery_unfound+undersized+degraded
>              16   stale+active+clean
>              16   stale+active+undersized+degraded
>              9    activating+undersized+degraded
>              3    active+recovery_wait+degraded
>              2    activating+undersized
>              2    activating+degraded
>              1    creating+down
>              1    stale+active+recovery_unfound+undersized+degraded
>              1    stale+active+clean+scrubbing+deep
>              1    stale+active+recovery_wait+degraded
>
> ceph -w: https://paste.ubuntu.com/p/WZ2YqzS86S/
> ceph health detail: https://paste.ubuntu.com/p/8w7Jpms8fj/
> by morphin <morphinwith...@gmail.com>, 25 Eyl 2018 Sal, 14:32
> tarihinde şunu yazdı:
> >
> > The config didnt work. Because increasing the number faced with more OSD Drops.
> >
> > bhfs -s
> >   cluster:
> >     id:     89569e73-eb89-41a4-9fc9-d2a5ec5f4106
> >     health: HEALTH_ERR
> >             norebalance,norecover flag(s) set
> >             1 osds down
> >             17/8839434 objects unfound (0.000%)
> >             Reduced data availability: 3578 pgs inactive, 861 pgs
> > down, 1928 pgs peering, 11 pgs stale
> >             Degraded data redundancy: 44853/17678868 objects degraded
> > (0.254%), 221 pgs degraded, 20 pgs undersized
> >             610 slow requests are blocked > 32 sec
> >             3996 stuck requests are blocked > 4096 sec
> >             6076 slow ops, oldest one blocked for 4129 sec, daemons
> > [osd.0,osd.1,osd.10,osd.100,osd.101,osd.102,osd.103,osd.104,osd.105,osd.106]...
> > have slow ops.
> >
> >   services:
> >     mon: 3 daemons, quorum SRV-SEKUARK3,SRV-SBKUARK2,SRV-SBKUARK3
> >     mgr: SRV-SBKUARK2(active), standbys: SRV-SEKUARK2, SRV-SEKUARK3
> >     osd: 168 osds: 128 up, 129 in; 2 remapped pgs
> >          flags norebalance,norecover
> >
> >   data:
> >     pools:   1 pools, 4096 pgs
> >     objects: 8.84 M objects, 17 TiB
> >     usage:   26 TiB used, 450 TiB / 477 TiB avail
> >     pgs:     0.024% pgs unknown
> >              89.160% pgs not active
> >              44853/17678868 objects degraded (0.254%)
> >              17/8839434 objects unfound (0.000%)
> >              1612 peering
> >              720  down
> >              583  activating
> >              319  stale+peering
> >              255  active+clean
> >              157  stale+activating
> >              108  stale+down
> >              95   activating+degraded
> >              84   stale+active+clean
> >              50   active+recovery_wait+degraded
> >              29   creating+down
> >              23   stale+activating+degraded
> >              18   stale+active+recovery_wait+degraded
> >              14   active+undersized+degraded
> >              12   active+recovering+degraded
> >              4    stale+creating+down
> >              3    stale+active+recovering+degraded
> >              3    stale+active+undersized+degraded
> >              2    stale
> >              1    active+recovery_wait+undersized+degraded
> >              1    active+clean+scrubbing+deep
> >              1    unknown
> >              1    active+undersized+degraded+remapped+backfilling
> >              1    active+recovering+undersized+degraded
> >
> > I guess OSD down and drop issue increases the recovery time. So I
> > decided to try with decreasing recovery parameters for less load on
> > cluster.
> > I have Nvme and SAS disks. Servers are powerfull enough. Network is 4x10Gb.
> > I dont think my cluster is a bad shape. Because I have datacenter
> > redundancy (14 servers + 14 servers). The crashed 7 servers are on
> > only datacenter A. And it took only a few minutes to back online. Also
> > 2 of them is monitors and cluster I/O should be suspended so there
> > should be less data difference.
> >
> > On the other hand I dont understand the burden of recovery. I have
> > faced many recoverys but none of the stopped my cluster working. This
> > recovery burden is so high that it didnt stop for hours. I wish I
> > could just decrease the recovery speed and continue to serve my VMs.
> > Is the change of recovery load some what different than mimic?
> > Luminous was pretty fine indeed.
> > by morphin <morphinwith...@gmail.com>, 25 Eyl 2018 Sal, 13:57
> > tarihinde şunu yazdı:
> > >
> > > Thank you for answer
> > >
> > > What do you think the conf for speed the recover?
> > >
> > > [osd]
> > > osd recovery op priority = 63
> > > osd client op priority = 1
> > > osd recovery max active = 16
> > > osd max scrubs = 16
> > > <ad...@data-center.com> adresine sahip kullanıcı 25 Eyl 2018 Sal,
> > > 13:37 tarihinde şunu yazdı:
> > > >
> > > > Just let it recover.
> > > >
> > > >   data:
> > > >     pools:   1 pools, 4096 pgs
> > > >     objects: 8.95 M objects, 17 TiB
> > > >     usage:   34 TiB used, 577 TiB / 611 TiB avail
> > > >     pgs:     94.873% pgs not active
> > > >              48475/17901254 objects degraded (0.271%)
> > > >              1/8950627 objects unfound (0.000%)
> > > >              2631 peering
> > > >              637  activating
> > > >              562  down
> > > >              159  active+clean
> > > >              44   activating+degraded
> > > >              30   active+recovery_wait+degraded
> > > >              12   activating+undersized+degraded
> > > >              10   active+recovering+degraded
> > > >              10   active+undersized+degraded
> > > >              1    active+clean+scrubbing+deep
> > > >
> > > > You've got deep scrubbed PGs which put considerable IO load on OSDs.
> > > >
> > > >
> > > > September 25, 2018 1:23 PM, "by morphin" <morphinwith...@gmail.com> wrote:
> > > >
> > > >
> > > > > What should I do now?
> > > > >
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to