I have to thank you all. You give free support and this already helps me. I’m not the one who knows ceph that good, but everyday it’s getting better and better ;-)
According to the article Brad posted I have to change the ceph osd crush tunables. But there are two questions left as I already wrote: - According to http://docs.ceph.com/docs/master/rados/operations/crush-map/#tunables <http://docs.ceph.com/docs/master/rados/operations/crush-map/#tunables> there are a few profiles. My needed profile would be BOBTAIL (CRUSH_TUNABLES2) wich would set choose_total_tries to 50. For the beginning better than 19. There I also see: "You can select a profile on a running cluster with the command: ceph osd crush tunables {PROFILE}“. My question on this is: Even if I run hammer, is it good and possible to set it to bobtail? - We can also read: WHICH CLIENT VERSIONS SUPPORT CRUSH_TUNABLES2 - v0.55 or later, including bobtail series (v0.56.x) - Linux kernel version v3.9 or later (for the file system and RBD kernel clients) And here my question is: If my clients use librados (version hammer), do I need to have this required kernel version on the clients or the ceph nodes? I don’t want to have troubles at the end with my clients. Can someone answer me this, before I change the settings? > Am 11.01.2017 um 06:47 schrieb Shinobu Kinjo <ski...@redhat.com>: > > Yeah, Sam is correct. I've not looked at crushmap. But I should have > noticed what troublesome is with looking at `ceph osd tree`. That's my > bad, sorry for that. > > Again please refer to: > > http://www.anchor.com.au/blog/2013/02/pulling-apart-cephs-crush-algorithm/ > > Regards, > > > On Wed, Jan 11, 2017 at 1:50 AM, Samuel Just <sj...@redhat.com> wrote: >> Shinobu isn't correct, you have 9/9 osds up and running. up does not >> equal acting because crush is having trouble fulfilling the weights in >> your crushmap and the acting set is being padded out with an extra osd >> which happens to have the data to keep you up to the right number of >> replicas. Please refer back to Brad's post. >> -Sam >> >> On Mon, Jan 9, 2017 at 11:08 PM, Marcus Müller <mueller.mar...@posteo.de> >> wrote: >>> Ok, i understand but how can I debug why they are not running as they >>> should? For me I thought everything is fine because ceph -s said they are >>> up and running. >>> >>> I would think of a problem with the crush map. >>> >>>> Am 10.01.2017 um 08:06 schrieb Shinobu Kinjo <ski...@redhat.com>: >>>> >>>> e.g., >>>> OSD7 / 3 / 0 are in the same acting set. They should be up, if they >>>> are properly running. >>>> >>>> # 9.7 >>>> <snip> >>>>> "up": [ >>>>> 7, >>>>> 3 >>>>> ], >>>>> "acting": [ >>>>> 7, >>>>> 3, >>>>> 0 >>>>> ], >>>> <snip> >>>> >>>> Here is an example: >>>> >>>> "up": [ >>>> 1, >>>> 0, >>>> 2 >>>> ], >>>> "acting": [ >>>> 1, >>>> 0, >>>> 2 >>>> ], >>>> >>>> Regards, >>>> >>>> >>>> On Tue, Jan 10, 2017 at 3:52 PM, Marcus Müller <mueller.mar...@posteo.de> >>>> wrote: >>>>>> >>>>>> That's not perfectly correct. >>>>>> >>>>>> OSD.0/1/2 seem to be down. >>>>> >>>>> >>>>> Sorry but where do you see this? I think this indicates that they are up: >>>>> osdmap e3114: 9 osds: 9 up, 9 in; 4 remapped pgs? >>>>> >>>>> >>>>>> Am 10.01.2017 um 07:50 schrieb Shinobu Kinjo <ski...@redhat.com>: >>>>>> >>>>>> On Tue, Jan 10, 2017 at 3:44 PM, Marcus Müller >>>>>> <mueller.mar...@posteo.de> wrote: >>>>>>> All osds are currently up: >>>>>>> >>>>>>> health HEALTH_WARN >>>>>>> 4 pgs stuck unclean >>>>>>> recovery 4482/58798254 objects degraded (0.008%) >>>>>>> recovery 420522/58798254 objects misplaced (0.715%) >>>>>>> noscrub,nodeep-scrub flag(s) set >>>>>>> monmap e9: 5 mons at >>>>>>> {ceph1=192.168.10.3:6789/0,ceph2=192.168.10.4:6789/0,ceph3=192.168.10.5:6789/0,ceph4=192.168.60.6:6789/0,ceph5=192.168.60.11:6789/0} >>>>>>> election epoch 478, quorum 0,1,2,3,4 >>>>>>> ceph1,ceph2,ceph3,ceph4,ceph5 >>>>>>> osdmap e3114: 9 osds: 9 up, 9 in; 4 remapped pgs >>>>>>> flags noscrub,nodeep-scrub >>>>>>> pgmap v9981077: 320 pgs, 3 pools, 4837 GB data, 19140 kobjects >>>>>>> 15070 GB used, 40801 GB / 55872 GB avail >>>>>>> 4482/58798254 objects degraded (0.008%) >>>>>>> 420522/58798254 objects misplaced (0.715%) >>>>>>> 316 active+clean >>>>>>> 4 active+remapped >>>>>>> client io 56601 B/s rd, 45619 B/s wr, 0 op/s >>>>>>> >>>>>>> This did not chance for two days or so. >>>>>>> >>>>>>> >>>>>>> By the way, my ceph osd df now looks like this: >>>>>>> >>>>>>> ID WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR >>>>>>> 0 1.28899 1.00000 3724G 1699G 2024G 45.63 1.69 >>>>>>> 1 1.57899 1.00000 3724G 1708G 2015G 45.87 1.70 >>>>>>> 2 1.68900 1.00000 3724G 1695G 2028G 45.54 1.69 >>>>>>> 3 6.78499 1.00000 7450G 1241G 6208G 16.67 0.62 >>>>>>> 4 8.39999 1.00000 7450G 1228G 6221G 16.49 0.61 >>>>>>> 5 9.51500 1.00000 7450G 1239G 6210G 16.64 0.62 >>>>>>> 6 7.66499 1.00000 7450G 1265G 6184G 16.99 0.63 >>>>>>> 7 9.75499 1.00000 7450G 2497G 4952G 33.52 1.24 >>>>>>> 8 9.32999 1.00000 7450G 2495G 4954G 33.49 1.24 >>>>>>> TOTAL 55872G 15071G 40801G 26.97 >>>>>>> MIN/MAX VAR: 0.61/1.70 STDDEV: 13.16 >>>>>>> >>>>>>> As you can see, now osd2 also went down to 45% Use and „lost“ data. But >>>>>>> I >>>>>>> also think this is no problem and ceph just clears everything up after >>>>>>> backfilling. >>>>>>> >>>>>>> >>>>>>> Am 10.01.2017 um 07:29 schrieb Shinobu Kinjo <ski...@redhat.com>: >>>>>>> >>>>>>> Looking at ``ceph -s`` you originally provided, all OSDs are up. >>>>>>> >>>>>>> osdmap e3114: 9 osds: 9 up, 9 in; 4 remapped pgs >>>>>>> >>>>>>> >>>>>>> But looking at ``pg query``, OSD.0 / 1 are not up. Are they something >>>>>> >>>>>> That's not perfectly correct. >>>>>> >>>>>> OSD.0/1/2 seem to be down. >>>>>> >>>>>>> like related to ?: >>>>>>> >>>>>>> Ceph1, ceph2 and ceph3 are vms on one physical host >>>>>>> >>>>>>> >>>>>>> Are those OSDs running on vm instances? >>>>>>> >>>>>>> # 9.7 >>>>>>> <snip> >>>>>>> >>>>>>> "state": "active+remapped", >>>>>>> "snap_trimq": "[]", >>>>>>> "epoch": 3114, >>>>>>> "up": [ >>>>>>> 7, >>>>>>> 3 >>>>>>> ], >>>>>>> "acting": [ >>>>>>> 7, >>>>>>> 3, >>>>>>> 0 >>>>>>> ], >>>>>>> >>>>>>> <snip> >>>>>>> >>>>>>> # 7.84 >>>>>>> <snip> >>>>>>> >>>>>>> "state": "active+remapped", >>>>>>> "snap_trimq": "[]", >>>>>>> "epoch": 3114, >>>>>>> "up": [ >>>>>>> 4, >>>>>>> 8 >>>>>>> ], >>>>>>> "acting": [ >>>>>>> 4, >>>>>>> 8, >>>>>>> 1 >>>>>>> ], >>>>>>> >>>>>>> <snip> >>>>>>> >>>>>>> # 8.1b >>>>>>> <snip> >>>>>>> >>>>>>> "state": "active+remapped", >>>>>>> "snap_trimq": "[]", >>>>>>> "epoch": 3114, >>>>>>> "up": [ >>>>>>> 4, >>>>>>> 7 >>>>>>> ], >>>>>>> "acting": [ >>>>>>> 4, >>>>>>> 7, >>>>>>> 2 >>>>>>> ], >>>>>>> >>>>>>> <snip> >>>>>>> >>>>>>> # 7.7a >>>>>>> <snip> >>>>>>> >>>>>>> "state": "active+remapped", >>>>>>> "snap_trimq": "[]", >>>>>>> "epoch": 3114, >>>>>>> "up": [ >>>>>>> 7, >>>>>>> 4 >>>>>>> ], >>>>>>> "acting": [ >>>>>>> 7, >>>>>>> 4, >>>>>>> 2 >>>>>>> ], >>>>>>> >>>>>>> <snip> >>>>>>> >>>>>>> >>>>> >>> >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@lists.ceph.com >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com