Re: [ceph-users] cluster ceph -s error

Ishmael Tsoaela Sun, 19 Jun 2016 23:56:39 -0700

Hi David,

Apologies for the late response.


NodeB is mon+client, nodeC is client:



Cheph health details:

HEALTH_ERR 819 pgs are stuck inactive for more than 300 seconds; 883 pgs
degraded; 64 pgs stale; 819 pgs stuck inactive; 1064 pgs stuck unclean; 883
pgs undersized; 22 requests are blocked > 32 sec; 3 osds have slow
requests; recovery 2/8 objects degraded (25.000%); recovery 2/8 objects
misplaced (25.000%); crush map has legacy tunables (require argonaut, min
is firefly); crush map has straw_calc_version=0
pg 2.fc is stuck inactive since forever, current state
undersized+degraded+peered, last acting [2]
pg 2.fd is stuck inactive since forever, current state
undersized+degraded+peered, last acting [0]
pg 2.fe is stuck inactive since forever, current state
undersized+degraded+peered, last acting [2]
pg 2.ff is stuck inactive since forever, current state
undersized+degraded+peered, last acting [1]
pg 1.fb is stuck inactive for 493857.572982, current state
undersized+degraded+peered, last acting [4]
pg 2.f8 is stuck inactive since forever, current state
undersized+degraded+peered, last acting [3]
pg 1.fa is stuck inactive for 492185.443146, current state
undersized+degraded+peered, last acting [0]
pg 2.f9 is stuck inactive since forever, current state
undersized+degraded+peered, last acting [0]
pg 1.f9 is stuck inactive for 492185.452890, current state
undersized+degraded+peered, last acting [2]
pg 2.fa is stuck inactive since forever, current state
undersized+degraded+peered, last acting [3]
pg 1.f8 is stuck inactive for 492185.443324, current state
undersized+degraded+peered, last acting [0]
pg 2.fb is stuck inactive since forever, current state
undersized+degraded+peered, last acting [2]
.
.
.

pg 1.fb is undersized+degraded+peered, acting [4]
pg 2.ff is undersized+degraded+peered, acting [1]
pg 2.fe is undersized+degraded+peered, acting [2]
pg 2.fd is undersized+degraded+peered, acting [0]
pg 2.fc is undersized+degraded+peered, acting [2]
3 ops are blocked > 536871 sec on osd.4
15 ops are blocked > 268435 sec on osd.4
1 ops are blocked > 262.144 sec on osd.4
2 ops are blocked > 268435 sec on osd.3
1 ops are blocked > 268435 sec on osd.1
3 osds have slow requests
recovery 2/8 objects degraded (25.000%)
recovery 2/8 objects misplaced (25.000%)
crush map has legacy tunables (require argonaut, min is firefly); see
http://ceph.com/docs/master/rados/operations/crush-map/#tunables
crush map has straw_calc_version=0; see
http://ceph.com/docs/master/rados/operations/crush-map/#tunables


ceph osd stat

cluster-admin@nodeB:~/.ssh/ceph-cluster$ cat ceph_osd_stat.txt
     osdmap e80: 10 osds: 5 up, 5 in; 558 remapped pgs
            flags sortbitwise


ceph osd tree:

cluster-admin@nodeB:~/.ssh/ceph-cluster$ ceph osd tree
ID WEIGHT  TYPE NAME      UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 9.08691 root default
-2 4.54346     host nodeB
 5 0.90869         osd.5     down        0          1.00000
 6 0.90869         osd.6     down        0          1.00000
 7 0.90869         osd.7     down        0          1.00000
 8 0.90869         osd.8     down        0          1.00000
 9 0.90869         osd.9     down        0          1.00000
-3 4.54346     host nodeC
 0 0.90869         osd.0       up  1.00000          1.00000
 1 0.90869         osd.1       up  1.00000          1.00000
 2 0.90869         osd.2       up  1.00000          1.00000
 3 0.90869         osd.3       up  1.00000          1.00000
 4 0.90869         osd.4       up  1.00000          1.00000




CrushMap:


# begin crush map

# devices
device 0 osd.0
device 1 osd.1
device 2 osd.2
device 3 osd.3
device 4 osd.4
device 5 osd.5
device 6 osd.6
device 7 osd.7
device 8 osd.8
device 9 osd.9

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root

# buckets
host nodeB {
        id -2           # do not change unnecessarily
        # weight 4.543
        alg straw
        hash 0  # rjenkins1
        item osd.5 weight 0.909
        item osd.6 weight 0.909
        item osd.7 weight 0.909
        item osd.8 weight 0.909
        item osd.9 weight 0.909
}
host nodeC {
        id -3           # do not change unnecessarily
        # weight 4.543
        alg straw
        hash 0  # rjenkins1
        item osd.0 weight 0.909
        item osd.1 weight 0.909
        item osd.2 weight 0.909
        item osd.3 weight 0.909
        item osd.4 weight 0.909
}
root default {
        id -1           # do not change unnecessarily
        # weight 9.087
        alg straw
        hash 0  # rjenkins1
        item nodeB weight 4.543
        item nodeC weight 4.543
}

# rules
rule replicated_ruleset {
        ruleset 0
        type replicated
        min_size 1
        max_size 10
        step take default
        step chooseleaf firstn 0 type host
        step emit
}

# end crush map



ceph.conf


cluster-admin@nodeB:~/.ssh/ceph-cluster$ cat /etc/ceph/ceph.conf
[global]
fsid = a04e9846-6c54-48ee-b26f-d6949d8bacb4
mon_initial_members = nodeB
mon_host = <mon IP>
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
public_network = X.X.X.0/24





On Sat, Jun 18, 2016 at 12:15 PM, David <[email protected]> wrote:

> Is this a test cluster that has never been healthy or a working cluster
> which has just gone unhealthy?  Have you changed anything? Are all hosts,
> drives, network links working? More detail please. Any/all of the following
> would help:
>
> ceph health detail
> ceph osd stat
> ceph osd tree
> Your ceph.conf
> Your crushmap
>
> On 17 Jun 2016 14:14, "Ishmael Tsoaela" <[email protected]> wrote:
> >
> > Hi All,
> >
> > please assist to fix the error:
> >
> > 1 X admin
> > 2 X admin(hosting admin as well)
> >
> > 4 osd each node
>
> Please provide more detail, this suggests you should have 12 osd's but
> your osd map shows 10 osd's, 5 of which are down.
> >
> >
> > cluster a04e9846-6c54-48ee-b26f-d6949d8bacb4
> >      health HEALTH_ERR
> >             819 pgs are stuck inactive for more than 300 seconds
> >             883 pgs degraded
> >             64 pgs stale
> >             819 pgs stuck inactive
> >             245 pgs stuck unclean
> >             883 pgs undersized
> >             17 requests are blocked > 32 sec
> >             recovery 2/8 objects degraded (25.000%)
> >             recovery 2/8 objects misplaced (25.000%)
> >             crush map has legacy tunables (require argonaut, min is
> firefly)
> >             crush map has straw_calc_version=0
> >      monmap e1: 1 mons at {nodeB=155.232.195.4:6789/0}
> >             election epoch 7, quorum 0 nodeB
> >      osdmap e80: 10 osds: 5 up, 5 in; 558 remapped pgs
> >             flags sortbitwise
> >       pgmap v480: 1064 pgs, 3 pools, 6454 bytes data, 4 objects
> >             25791 MB used, 4627 GB / 4652 GB avail
> >             2/8 objects degraded (25.000%)
> >             2/8 objects misplaced (25.000%)
> >                  819 undersized+degraded+peered
> >                  181 active
> >                   64 stale+active+undersized+degraded
> >
> >
> > _______________________________________________
> > ceph-users mailing list
> > [email protected]
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] cluster ceph -s error

Reply via email to