Re: [ceph-users] cluster ceph -s error

施柏安 Mon, 20 Jun 2016 01:18:07 -0700

Hi,
It seems that one of your OSD server are dead. If you use the default
setting of Ceph(size=3, min_size=2), there should have three OSD nodes to
distribute objects' replicas. The important one is that, you only have one
OSD node alive. The living object replication leave 1 (< min_size). So
there show inactive to pgs.


2016-06-20 14:55 GMT+08:00 Ishmael Tsoaela <[email protected]>:

> Hi David,
>
> Apologies for the late response.
>
> NodeB is mon+client, nodeC is client:
>
>
>
> Cheph health details:
>
> HEALTH_ERR 819 pgs are stuck inactive for more than 300 seconds; 883 pgs
> degraded; 64 pgs stale; 819 pgs stuck inactive; 1064 pgs stuck unclean; 883
> pgs undersized; 22 requests are blocked > 32 sec; 3 osds have slow
> requests; recovery 2/8 objects degraded (25.000%); recovery 2/8 objects
> misplaced (25.000%); crush map has legacy tunables (require argonaut, min
> is firefly); crush map has straw_calc_version=0
> pg 2.fc is stuck inactive since forever, current state
> undersized+degraded+peered, last acting [2]
> pg 2.fd is stuck inactive since forever, current state
> undersized+degraded+peered, last acting [0]
> pg 2.fe is stuck inactive since forever, current state
> undersized+degraded+peered, last acting [2]
> pg 2.ff is stuck inactive since forever, current state
> undersized+degraded+peered, last acting [1]
> pg 1.fb is stuck inactive for 493857.572982, current state
> undersized+degraded+peered, last acting [4]
> pg 2.f8 is stuck inactive since forever, current state
> undersized+degraded+peered, last acting [3]
> pg 1.fa is stuck inactive for 492185.443146, current state
> undersized+degraded+peered, last acting [0]
> pg 2.f9 is stuck inactive since forever, current state
> undersized+degraded+peered, last acting [0]
> pg 1.f9 is stuck inactive for 492185.452890, current state
> undersized+degraded+peered, last acting [2]
> pg 2.fa is stuck inactive since forever, current state
> undersized+degraded+peered, last acting [3]
> pg 1.f8 is stuck inactive for 492185.443324, current state
> undersized+degraded+peered, last acting [0]
> pg 2.fb is stuck inactive since forever, current state
> undersized+degraded+peered, last acting [2]
> .
> .
> .
>
> pg 1.fb is undersized+degraded+peered, acting [4]
> pg 2.ff is undersized+degraded+peered, acting [1]
> pg 2.fe is undersized+degraded+peered, acting [2]
> pg 2.fd is undersized+degraded+peered, acting [0]
> pg 2.fc is undersized+degraded+peered, acting [2]
> 3 ops are blocked > 536871 sec on osd.4
> 15 ops are blocked > 268435 sec on osd.4
> 1 ops are blocked > 262.144 sec on osd.4
> 2 ops are blocked > 268435 sec on osd.3
> 1 ops are blocked > 268435 sec on osd.1
> 3 osds have slow requests
> recovery 2/8 objects degraded (25.000%)
> recovery 2/8 objects misplaced (25.000%)
> crush map has legacy tunables (require argonaut, min is firefly); see
> http://ceph.com/docs/master/rados/operations/crush-map/#tunables
> crush map has straw_calc_version=0; see
> http://ceph.com/docs/master/rados/operations/crush-map/#tunables
>
>
> ceph osd stat
>
> cluster-admin@nodeB:~/.ssh/ceph-cluster$ cat ceph_osd_stat.txt
>      osdmap e80: 10 osds: 5 up, 5 in; 558 remapped pgs
>             flags sortbitwise
>
>
> ceph osd tree:
>
> cluster-admin@nodeB:~/.ssh/ceph-cluster$ ceph osd tree
> ID WEIGHT  TYPE NAME      UP/DOWN REWEIGHT PRIMARY-AFFINITY
> -1 9.08691 root default
> -2 4.54346     host nodeB
>  5 0.90869         osd.5     down        0          1.00000
>  6 0.90869         osd.6     down        0          1.00000
>  7 0.90869         osd.7     down        0          1.00000
>  8 0.90869         osd.8     down        0          1.00000
>  9 0.90869         osd.9     down        0          1.00000
> -3 4.54346     host nodeC
>  0 0.90869         osd.0       up  1.00000          1.00000
>  1 0.90869         osd.1       up  1.00000          1.00000
>  2 0.90869         osd.2       up  1.00000          1.00000
>  3 0.90869         osd.3       up  1.00000          1.00000
>  4 0.90869         osd.4       up  1.00000          1.00000
>
>
>
>
> CrushMap:
>
>
> # begin crush map
>
> # devices
> device 0 osd.0
> device 1 osd.1
> device 2 osd.2
> device 3 osd.3
> device 4 osd.4
> device 5 osd.5
> device 6 osd.6
> device 7 osd.7
> device 8 osd.8
> device 9 osd.9
>
> # types
> type 0 osd
> type 1 host
> type 2 chassis
> type 3 rack
> type 4 row
> type 5 pdu
> type 6 pod
> type 7 room
> type 8 datacenter
> type 9 region
> type 10 root
>
> # buckets
> host nodeB {
>         id -2           # do not change unnecessarily
>         # weight 4.543
>         alg straw
>         hash 0  # rjenkins1
>         item osd.5 weight 0.909
>         item osd.6 weight 0.909
>         item osd.7 weight 0.909
>         item osd.8 weight 0.909
>         item osd.9 weight 0.909
> }
> host nodeC {
>         id -3           # do not change unnecessarily
>         # weight 4.543
>         alg straw
>         hash 0  # rjenkins1
>         item osd.0 weight 0.909
>         item osd.1 weight 0.909
>         item osd.2 weight 0.909
>         item osd.3 weight 0.909
>         item osd.4 weight 0.909
> }
> root default {
>         id -1           # do not change unnecessarily
>         # weight 9.087
>         alg straw
>         hash 0  # rjenkins1
>         item nodeB weight 4.543
>         item nodeC weight 4.543
> }
>
> # rules
> rule replicated_ruleset {
>         ruleset 0
>         type replicated
>         min_size 1
>         max_size 10
>         step take default
>         step chooseleaf firstn 0 type host
>         step emit
> }
>
> # end crush map
>
>
>
> ceph.conf
>
>
> cluster-admin@nodeB:~/.ssh/ceph-cluster$ cat /etc/ceph/ceph.conf
> [global]
> fsid = a04e9846-6c54-48ee-b26f-d6949d8bacb4
> mon_initial_members = nodeB
> mon_host = <mon IP>
> auth_cluster_required = cephx
> auth_service_required = cephx
> auth_client_required = cephx
> public_network = X.X.X.0/24
>
>
>
>
>
> On Sat, Jun 18, 2016 at 12:15 PM, David <[email protected]> wrote:
>
>> Is this a test cluster that has never been healthy or a working cluster
>> which has just gone unhealthy?  Have you changed anything? Are all hosts,
>> drives, network links working? More detail please. Any/all of the following
>> would help:
>>
>> ceph health detail
>> ceph osd stat
>> ceph osd tree
>> Your ceph.conf
>> Your crushmap
>>
>> On 17 Jun 2016 14:14, "Ishmael Tsoaela" <[email protected]> wrote:
>> >
>> > Hi All,
>> >
>> > please assist to fix the error:
>> >
>> > 1 X admin
>> > 2 X admin(hosting admin as well)
>> >
>> > 4 osd each node
>>
>> Please provide more detail, this suggests you should have 12 osd's but
>> your osd map shows 10 osd's, 5 of which are down.
>> >
>> >
>> > cluster a04e9846-6c54-48ee-b26f-d6949d8bacb4
>> >      health HEALTH_ERR
>> >             819 pgs are stuck inactive for more than 300 seconds
>> >             883 pgs degraded
>> >             64 pgs stale
>> >             819 pgs stuck inactive
>> >             245 pgs stuck unclean
>> >             883 pgs undersized
>> >             17 requests are blocked > 32 sec
>> >             recovery 2/8 objects degraded (25.000%)
>> >             recovery 2/8 objects misplaced (25.000%)
>> >             crush map has legacy tunables (require argonaut, min is
>> firefly)
>> >             crush map has straw_calc_version=0
>> >      monmap e1: 1 mons at {nodeB=155.232.195.4:6789/0}
>> >             election epoch 7, quorum 0 nodeB
>> >      osdmap e80: 10 osds: 5 up, 5 in; 558 remapped pgs
>> >             flags sortbitwise
>> >       pgmap v480: 1064 pgs, 3 pools, 6454 bytes data, 4 objects
>> >             25791 MB used, 4627 GB / 4652 GB avail
>> >             2/8 objects degraded (25.000%)
>> >             2/8 objects misplaced (25.000%)
>> >                  819 undersized+degraded+peered
>> >                  181 active
>> >                   64 stale+active+undersized+degraded
>> >
>> >
>> > _______________________________________________
>> > ceph-users mailing list
>> > [email protected]
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>>
>
>
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


-- 

Best regards,

施柏安 Desmond Shih
技術研發部 Technical Development
 <http://www.inwinstack.com/>
迎棧科技股份有限公司
│ 886-975-857-982
│ desmond.s@inwinstack <[email protected]>.com
│ 886-2-7738-2858 #7725
│ 新北市220板橋區遠東路3號5樓C室
Rm.C, 5F., No.3, Yuandong Rd.,
Banqiao Dist., New Taipei City 220, Taiwan (R.O.C)

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] cluster ceph -s error

Reply via email to