# ceph health detail | grep 'ops are blocked' # ceph osd blocked-by My guess is that you have an OSD that is in a funky state blocking the requests and the peering. Let me know what the output of those commands are.
Also what are the replica sizes of your 2 pools? It shows that only 1 OSD was last active for the 2 inactive PGs. Not sure yet if that is anything of concern, but didn't want to ignore it. On Fri, Jun 23, 2017 at 1:16 PM Daniel Davidson <[email protected]> wrote: > Two of our OSD systems hit 75% disk utilization, so I added another > system to try and bring that back down. The system was usable for a day > while the data was being migrated, but now the system is not responding > when I try to mount it: > > mount -t ceph ceph-0,ceph-1,ceph-2,ceph-3:6789:/ /home -o > name=admin,secretfile=/etc/ceph/admin.secret > mount error 5 = Input/output error > > Here is our ceph health > > [root@ceph-3 ~]# ceph -s > cluster 7bffce86-9d7b-4bdf-a9c9-67670e68ca77 > health HEALTH_ERR > 2 pgs are stuck inactive for more than 300 seconds > 58 pgs backfill_wait > 20 pgs backfilling > 3 pgs degraded > 2 pgs stuck inactive > 76 pgs stuck unclean > 2 pgs undersized > 100 requests are blocked > 32 sec > recovery 1197145/653713908 objects degraded (0.183%) > recovery 47420551/653713908 objects misplaced (7.254%) > mds0: Behind on trimming (180/30) > mds0: Client biologin-0 failing to respond to capability > release > mds0: Many clients (20) failing to respond to cache pressure > monmap e3: 4 mons at > {ceph-0= > 172.16.31.1:6789/0,ceph-1=172.16.31.2:6789/0,ceph-2=172.16.31.3:6789/0,ceph-3=172.16.31.4:6789/0 > } > election epoch 542, quorum 0,1,2,3 ceph-0,ceph-1,ceph-2,ceph-3 > fsmap e17666: 1/1/1 up {0=ceph-0=up:active}, 3 up:standby > osdmap e25535: 32 osds: 32 up, 32 in; 78 remapped pgs > flags sortbitwise,require_jewel_osds > pgmap v19199544: 1536 pgs, 2 pools, 786 TB data, 299 Mobjects > 1595 TB used, 1024 TB / 2619 TB avail > 1197145/653713908 objects degraded (0.183%) > 47420551/653713908 objects misplaced (7.254%) > 1448 active+clean > 58 active+remapped+wait_backfill > 17 active+remapped+backfilling > 10 active+clean+scrubbing+deep > 2 undersized+degraded+remapped+backfilling+peered > 1 active+degraded+remapped+backfilling > recovery io 906 MB/s, 331 objects/s > > Checking in on the inactive PGs > > [root@ceph-control ~]# ceph health detail |grep inactive > HEALTH_ERR 2 pgs are stuck inactive for more than 300 seconds; 58 pgs > backfill_wait; 20 pgs backfilling; 3 pgs degraded; 2 pgs stuck inactive; > 78 pgs stuck unclean; 2 pgs undersized; 100 requests are blocked > 32 > sec; 1 osds have slow requests; recovery 1197145/653713908 objects > degraded (0.183%); recovery 47390082/653713908 objects misplaced > (7.249%); mds0: Behind on trimming (180/30); mds0: Client biologin-0 > failing to respond to capability release; mds0: Many clients (20) > failing to respond to cache pressure > pg 2.1b5 is stuck inactive for 77215.112164, current state > undersized+degraded+remapped+backfilling+peered, last acting [13] > pg 2.145 is stuck inactive for 76910.328647, current state > undersized+degraded+remapped+backfilling+peered, last acting [13] > > If I query, then I dont get a response: > > [root@ceph-control ~]# ceph pg 2.1b5 query > > Any ideas on what to do? > > Dan > > _______________________________________________ > ceph-users mailing list > [email protected] > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
