Hi ,

We are validating kraken 11.2.0 with bluestore  on 5 node cluster with EC
4+1.

When an OSD is down , the peering is not happening and ceph health status
moved to ERR state after few mins. This was working in previous development
releases. Any additional configuration required in v11.2.0

Following is our ceph configuration:

mon_osd_down_out_interval = 30
mon_osd_report_timeout = 30
mon_osd_down_out_subtree_limit = host
mon_osd_reporter_subtree_level = host

and the recovery parameters set to default.

[root@ca-cn1 ceph]# ceph osd crush show-tunables

{
    "choose_local_tries": 0,
    "choose_local_fallback_tries": 0,
    "choose_total_tries": 50,
    "chooseleaf_descend_once": 1,
    "chooseleaf_vary_r": 1,
    "chooseleaf_stable": 1,
    "straw_calc_version": 1,
    "allowed_bucket_algs": 54,
    "profile": "jewel",
    "optimal_tunables": 1,
    "legacy_tunables": 0,
    "minimum_required_version": "jewel",
    "require_feature_tunables": 1,
    "require_feature_tunables2": 1,
    "has_v2_rules": 1,
    "require_feature_tunables3": 1,
    "has_v3_rules": 0,
    "has_v4_buckets": 0,
    "require_feature_tunables5": 1,
    "has_v5_rules": 0
}

ceph status:

     health HEALTH_ERR
            173 pgs are stuck inactive for more than 300 seconds
            173 pgs incomplete
            173 pgs stuck inactive
            173 pgs stuck unclean
     monmap e2: 5 mons at {ca-cn1=
10.50.5.117:6789/0,ca-cn2=10.50.5.118:6789/0,ca-cn3=10.50.5.119:6789/0,ca-cn4=10.50.5.120:6789/0,ca-cn5=10.50.5.121:6789/0
}
            election epoch 106, quorum 0,1,2,3,4
ca-cn1,ca-cn2,ca-cn3,ca-cn4,ca-cn5
        mgr active: ca-cn1 standbys: ca-cn2, ca-cn4, ca-cn5, ca-cn3
     osdmap e1128: 60 osds: 59 up, 59 in; 173 remapped pgs
            flags sortbitwise,require_jewel_osds,require_kraken_osds
      pgmap v782747: 2048 pgs, 1 pools, 63133 GB data, 46293 kobjects
            85199 GB used, 238 TB / 322 TB avail
                1868 active+clean
                 173 remapped+incomplete
                   7 active+clean+scrubbing

MON log:

2017-01-20 09:25:54.715684 7f55bcafb700  0 log_channel(cluster) log [INF] :
osd.54 out (down for 31.703786)
2017-01-20 09:25:54.725688 7f55bf4d5700  0 mon.ca-cn1@0(leader).osd e1120
crush map has features 288250512065953792, adjusting msgr requires
2017-01-20 09:25:54.729019 7f55bf4d5700  0 log_channel(cluster) log [INF] :
osdmap e1120: 60 osds: 59 up, 59 in
2017-01-20 09:25:54.735987 7f55bf4d5700  0 log_channel(cluster) log [INF] :
pgmap v781993: 2048 pgs: 1869 active+clean, 173 incomplete, 6
active+clean+scrubbing; 63159 GB data, 85201 GB used, 238 TB / 322 TB
avail; 21825 B/s rd, 163 MB/s wr, 2046 op/s
2017-01-20 09:25:55.737749 7f55bf4d5700  0 mon.ca-cn1@0(leader).osd e1121
crush map has features 288250512065953792, adjusting msgr requires
2017-01-20 09:25:55.744338 7f55bf4d5700  0 log_channel(cluster) log [INF] :
osdmap e1121: 60 osds: 59 up, 59 in
2017-01-20 09:25:55.749616 7f55bf4d5700  0 log_channel(cluster) log [INF] :
pgmap v781994: 2048 pgs: 29 remapped+incomplete, 1869 active+clean, 144
incomplete, 6 active+clean+scrubbing; 63159 GB data, 85201 GB used, 238 TB
/ 322 TB avail; 44503 B/s rd, 45681 kB/s wr, 518 op/s
2017-01-20 09:25:56.768721 7f55bf4d5700  0 log_channel(cluster) log [INF] :
pgmap v781995: 2048 pgs: 47 remapped+incomplete, 1869 active+clean, 126
incomplete, 6 active+clean+scrubbing; 63159 GB data, 85201 GB used, 238 TB
/ 322 TB avail; 20275 B/s rd, 72742 kB/s wr, 665 op/s

Thanks,
Muthu
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to