Does "ceph health detail" work?
Have you manually confirmed the OSDs on the nodes are working?
What was the replica size of the pools?
Are you seeing any progress with the recovery?



On Sun, Sep 2, 2018 at 9:42 AM Lee <lqui...@gmail.com> wrote:

> Running 0.94.5 as part of a Openstack enviroment, our ceph setup is 3x OSD
> Nodes 3x MON Nodes, yesterday we had a aircon outage in our hosting
> enviroment, 1 OSD node failed (offline with a the journal SSD dead) left
> with 2 nodes running correctly, 2 hours later a second OSD node failed
> complaining of readwrite errors to the physical drives, i assume this was a
> heat issue as when rebooted this came back online ok and ceph started to
> repair itself. We have since brought the first failed node back on by
> replacing the ssd and recreating the journals hoping it would all repair..
> Our pools are min 2 repl.
>
> The problem we have is client IO (read) is totally blocked, and when I
> query the stuck PG's it just hangs..
>
> For example the check version command just errors with:
>
> Error EINTR: problem getting command descriptions from on various OSD's so
> I cannot even query the inactive PG's
>
> root@node31-a4:~# ceph -s
>     cluster 7c24e1b9-24b3-4a1b-8889-9b2d7fd88cd2
>      health HEALTH_WARN
>             83 pgs backfill
>             2 pgs backfill_toofull
>             3 pgs backfilling
>             48 pgs degraded
>             1 pgs down
>             31 pgs incomplete
>             1 pgs recovering
>             29 pgs recovery_wait
>             1 pgs stale
>             48 pgs stuck degraded
>             31 pgs stuck inactive
>             1 pgs stuck stale
>             148 pgs stuck unclean
>             17 pgs stuck undersized
>             17 pgs undersized
>             599 requests are blocked > 32 sec
>             recovery 111489/4697618 objects degraded (2.373%)
>             recovery 772268/4697618 objects misplaced (16.440%)
>             recovery 1/2171314 unfound (0.000%)
>      monmap e5: 3 mons at {bc07s12-a7=
> 172.27.16.11:6789/0,bc07s13-a7=172.27.16.21:6789/0,bc07s14-a7=172.27.16.15:6789/0
> }
>             election epoch 198, quorum 0,1,2
> bc07s12-a7,bc07s14-a7,bc07s13-a7
>      osdmap e18727: 25 osds: 25 up, 25 in; 90 remapped pgs
>       pgmap v70996322: 1792 pgs, 13 pools, 8210 GB data, 2120 kobjects
>             16783 GB used, 6487 GB / 23270 GB avail
>             111489/4697618 objects degraded (2.373%)
>             772268/4697618 objects misplaced (16.440%)
>             1/2171314 unfound (0.000%)
>                 1639 active+clean
>                   66 active+remapped+wait_backfill
>                   30 incomplete
>                   25 active+recovery_wait+degraded
>                   15 active+undersized+degraded+remapped+wait_backfill
>                    4 active+recovery_wait+degraded+remapped
>                    4 active+clean+scrubbing
>                    2 active+remapped+wait_backfill+backfill_toofull
>                    1 down+incomplete
>                    1 active+remapped+backfilling
>                    1 active+clean+scrubbing+deep
>                    1 stale+active+undersized+degraded
>                    1 active+undersized+degraded+remapped+backfilling
>                    1 active+degraded+remapped+backfilling
>                    1 active+recovering+degraded
> recovery io 29385 kB/s, 7 objects/s
>   client io 5877 B/s wr, 1 op/s
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to