Thanks for the response:

[root@ceph-control ~]# ceph health detail | grep 'ops are blocked'
100 ops are blocked > 134218 sec on osd.13
[root@ceph-control ~]# ceph osd blocked-by
osd num_blocked

A problem with osd.13?

Dan

On 06/23/2017 02:03 PM, David Turner wrote:
# ceph health detail | grep 'ops are blocked'
# ceph osd blocked-by

My guess is that you have an OSD that is in a funky state blocking the requests and the peering. Let me know what the output of those commands are.

Also what are the replica sizes of your 2 pools? It shows that only 1 OSD was last active for the 2 inactive PGs. Not sure yet if that is anything of concern, but didn't want to ignore it.

On Fri, Jun 23, 2017 at 1:16 PM Daniel Davidson <[email protected] <mailto:[email protected]>> wrote:

    Two of our OSD systems hit 75% disk utilization, so I added another
    system to try and bring that back down.  The system was usable for
    a day
    while the data was being migrated, but now the system is not
    responding
    when I try to mount it:

      mount -t ceph ceph-0,ceph-1,ceph-2,ceph-3:6789:/ /home -o
    name=admin,secretfile=/etc/ceph/admin.secret
    mount error 5 = Input/output error

    Here is our ceph health

    [root@ceph-3 ~]# ceph -s
         cluster 7bffce86-9d7b-4bdf-a9c9-67670e68ca77
          health HEALTH_ERR
                 2 pgs are stuck inactive for more than 300 seconds
                 58 pgs backfill_wait
                 20 pgs backfilling
                 3 pgs degraded
                 2 pgs stuck inactive
                 76 pgs stuck unclean
                 2 pgs undersized
                 100 requests are blocked > 32 sec
                 recovery 1197145/653713908 objects degraded (0.183%)
                 recovery 47420551/653713908 objects misplaced (7.254%)
                 mds0: Behind on trimming (180/30)
                 mds0: Client biologin-0 failing to respond to capability
    release
                 mds0: Many clients (20) failing to respond to cache
    pressure
          monmap e3: 4 mons at
    {ceph-0=*MailScanner warning: numerical links are often
    malicious:*
    
172.16.31.1:6789/0,ceph-1=172.16.31.2:6789/0,ceph-2=172.16.31.3:6789/0,ceph-3=172.16.31.4:6789/0
    
<http://172.16.31.1:6789/0,ceph-1=172.16.31.2:6789/0,ceph-2=172.16.31.3:6789/0,ceph-3=172.16.31.4:6789/0>}
                 election epoch 542, quorum 0,1,2,3
    ceph-0,ceph-1,ceph-2,ceph-3
           fsmap e17666: 1/1/1 up {0=ceph-0=up:active}, 3 up:standby
          osdmap e25535: 32 osds: 32 up, 32 in; 78 remapped pgs
                 flags sortbitwise,require_jewel_osds
           pgmap v19199544: 1536 pgs, 2 pools, 786 TB data, 299 Mobjects
                 1595 TB used, 1024 TB / 2619 TB avail
                 1197145/653713908 objects degraded (0.183%)
                 47420551/653713908 objects misplaced (7.254%)
                     1448 active+clean
                       58 active+remapped+wait_backfill
                       17 active+remapped+backfilling
                       10 active+clean+scrubbing+deep
                        2 undersized+degraded+remapped+backfilling+peered
                        1 active+degraded+remapped+backfilling
    recovery io 906 MB/s, 331 objects/s

    Checking in on the inactive PGs

    [root@ceph-control ~]# ceph health detail |grep inactive
    HEALTH_ERR 2 pgs are stuck inactive for more than 300 seconds; 58 pgs
    backfill_wait; 20 pgs backfilling; 3 pgs degraded; 2 pgs stuck
    inactive;
    78 pgs stuck unclean; 2 pgs undersized; 100 requests are blocked > 32
    sec; 1 osds have slow requests; recovery 1197145/653713908 objects
    degraded (0.183%); recovery 47390082/653713908 objects misplaced
    (7.249%); mds0: Behind on trimming (180/30); mds0: Client biologin-0
    failing to respond to capability release; mds0: Many clients (20)
    failing to respond to cache pressure
    pg 2.1b5 is stuck inactive for 77215.112164, current state
    undersized+degraded+remapped+backfilling+peered, last acting [13]
    pg 2.145 is stuck inactive for 76910.328647, current state
    undersized+degraded+remapped+backfilling+peered, last acting [13]

    If I query, then I dont get a response:

    [root@ceph-control ~]# ceph pg 2.1b5 query

    Any ideas on what to do?

    Dan

    _______________________________________________
    ceph-users mailing list
    [email protected] <mailto:[email protected]>
    http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to