Re: [ceph-users] PGs stuck active+remapped and osds lose data?!

Marcus Müller Tue, 10 Jan 2017 23:39:05 -0800

I have to thank you all. You give free support and this already helps me. I’m 
not the one who knows ceph that good, but everyday it’s getting better and 
better ;-)


According to the article Brad posted I have to change the ceph osd crush 
tunables. But there are two questions left as I already wrote:

- According to 
http://docs.ceph.com/docs/master/rados/operations/crush-map/#tunables 
<http://docs.ceph.com/docs/master/rados/operations/crush-map/#tunables> there 
are a few profiles. My needed profile would be BOBTAIL (CRUSH_TUNABLES2) wich 
would set choose_total_tries to 50. For the beginning better than 19. There I 
also see: "You can select a profile on a running cluster with the command: ceph 
osd crush tunables {PROFILE}“. My question on this is: Even if I run hammer, is 
it good and possible to set it to bobtail?

- We can also read: 
  WHICH CLIENT VERSIONS SUPPORT CRUSH_TUNABLES2 
  - v0.55 or later, including bobtail series (v0.56.x) 
  - Linux kernel version v3.9 or later (for the file system and RBD kernel 
clients)

And here my question is: If my clients use librados (version hammer), do I need 
to have this required kernel version on the clients or the ceph nodes? 

I don’t want to have troubles at the end with my clients. Can someone answer me 
this, before I change the settings?


> Am 11.01.2017 um 06:47 schrieb Shinobu Kinjo <ski...@redhat.com>:
> 
> Yeah, Sam is correct. I've not looked at crushmap. But I should have
> noticed what troublesome is with looking at `ceph osd tree`. That's my
> bad, sorry for that.
> 
> Again please refer to:
> 
> http://www.anchor.com.au/blog/2013/02/pulling-apart-cephs-crush-algorithm/
> 
> Regards,
> 
> 
> On Wed, Jan 11, 2017 at 1:50 AM, Samuel Just <sj...@redhat.com> wrote:
>> Shinobu isn't correct, you have 9/9 osds up and running.  up does not
>> equal acting because crush is having trouble fulfilling the weights in
>> your crushmap and the acting set is being padded out with an extra osd
>> which happens to have the data to keep you up to the right number of
>> replicas.  Please refer back to Brad's post.
>> -Sam
>> 
>> On Mon, Jan 9, 2017 at 11:08 PM, Marcus Müller <mueller.mar...@posteo.de> 
>> wrote:
>>> Ok, i understand but how can I debug why they are not running as they 
>>> should? For me I thought everything is fine because ceph -s said they are 
>>> up and running.
>>> 
>>> I would think of a problem with the crush map.
>>> 
>>>> Am 10.01.2017 um 08:06 schrieb Shinobu Kinjo <ski...@redhat.com>:
>>>> 
>>>> e.g.,
>>>> OSD7 / 3 / 0 are in the same acting set. They should be up, if they
>>>> are properly running.
>>>> 
>>>> # 9.7
>>>> <snip>
>>>>>  "up": [
>>>>>      7,
>>>>>      3
>>>>>  ],
>>>>>  "acting": [
>>>>>      7,
>>>>>      3,
>>>>>      0
>>>>>  ],
>>>> <snip>
>>>> 
>>>> Here is an example:
>>>> 
>>>> "up": [
>>>>   1,
>>>>   0,
>>>>   2
>>>> ],
>>>> "acting": [
>>>>   1,
>>>>   0,
>>>>   2
>>>>  ],
>>>> 
>>>> Regards,
>>>> 
>>>> 
>>>> On Tue, Jan 10, 2017 at 3:52 PM, Marcus Müller <mueller.mar...@posteo.de> 
>>>> wrote:
>>>>>> 
>>>>>> That's not perfectly correct.
>>>>>> 
>>>>>> OSD.0/1/2 seem to be down.
>>>>> 
>>>>> 
>>>>> Sorry but where do you see this? I think this indicates that they are up: 
>>>>>   osdmap e3114: 9 osds: 9 up, 9 in; 4 remapped pgs?
>>>>> 
>>>>> 
>>>>>> Am 10.01.2017 um 07:50 schrieb Shinobu Kinjo <ski...@redhat.com>:
>>>>>> 
>>>>>> On Tue, Jan 10, 2017 at 3:44 PM, Marcus Müller 
>>>>>> <mueller.mar...@posteo.de> wrote:
>>>>>>> All osds are currently up:
>>>>>>> 
>>>>>>>   health HEALTH_WARN
>>>>>>>          4 pgs stuck unclean
>>>>>>>          recovery 4482/58798254 objects degraded (0.008%)
>>>>>>>          recovery 420522/58798254 objects misplaced (0.715%)
>>>>>>>          noscrub,nodeep-scrub flag(s) set
>>>>>>>   monmap e9: 5 mons at
>>>>>>> {ceph1=192.168.10.3:6789/0,ceph2=192.168.10.4:6789/0,ceph3=192.168.10.5:6789/0,ceph4=192.168.60.6:6789/0,ceph5=192.168.60.11:6789/0}
>>>>>>>          election epoch 478, quorum 0,1,2,3,4
>>>>>>> ceph1,ceph2,ceph3,ceph4,ceph5
>>>>>>>   osdmap e3114: 9 osds: 9 up, 9 in; 4 remapped pgs
>>>>>>>          flags noscrub,nodeep-scrub
>>>>>>>    pgmap v9981077: 320 pgs, 3 pools, 4837 GB data, 19140 kobjects
>>>>>>>          15070 GB used, 40801 GB / 55872 GB avail
>>>>>>>          4482/58798254 objects degraded (0.008%)
>>>>>>>          420522/58798254 objects misplaced (0.715%)
>>>>>>>               316 active+clean
>>>>>>>                 4 active+remapped
>>>>>>> client io 56601 B/s rd, 45619 B/s wr, 0 op/s
>>>>>>> 
>>>>>>> This did not chance for two days or so.
>>>>>>> 
>>>>>>> 
>>>>>>> By the way, my ceph osd df now looks like this:
>>>>>>> 
>>>>>>> ID WEIGHT  REWEIGHT SIZE   USE    AVAIL  %USE  VAR
>>>>>>> 0 1.28899  1.00000  3724G  1699G  2024G 45.63 1.69
>>>>>>> 1 1.57899  1.00000  3724G  1708G  2015G 45.87 1.70
>>>>>>> 2 1.68900  1.00000  3724G  1695G  2028G 45.54 1.69
>>>>>>> 3 6.78499  1.00000  7450G  1241G  6208G 16.67 0.62
>>>>>>> 4 8.39999  1.00000  7450G  1228G  6221G 16.49 0.61
>>>>>>> 5 9.51500  1.00000  7450G  1239G  6210G 16.64 0.62
>>>>>>> 6 7.66499  1.00000  7450G  1265G  6184G 16.99 0.63
>>>>>>> 7 9.75499  1.00000  7450G  2497G  4952G 33.52 1.24
>>>>>>> 8 9.32999  1.00000  7450G  2495G  4954G 33.49 1.24
>>>>>>>            TOTAL 55872G 15071G 40801G 26.97
>>>>>>> MIN/MAX VAR: 0.61/1.70  STDDEV: 13.16
>>>>>>> 
>>>>>>> As you can see, now osd2 also went down to 45% Use and „lost“ data. But 
>>>>>>> I
>>>>>>> also think this is no problem and ceph just clears everything up after
>>>>>>> backfilling.
>>>>>>> 
>>>>>>> 
>>>>>>> Am 10.01.2017 um 07:29 schrieb Shinobu Kinjo <ski...@redhat.com>:
>>>>>>> 
>>>>>>> Looking at ``ceph -s`` you originally provided, all OSDs are up.
>>>>>>> 
>>>>>>> osdmap e3114: 9 osds: 9 up, 9 in; 4 remapped pgs
>>>>>>> 
>>>>>>> 
>>>>>>> But looking at ``pg query``, OSD.0 / 1 are not up. Are they something
>>>>>> 
>>>>>> That's not perfectly correct.
>>>>>> 
>>>>>> OSD.0/1/2 seem to be down.
>>>>>> 
>>>>>>> like related to ?:
>>>>>>> 
>>>>>>> Ceph1, ceph2 and ceph3 are vms on one physical host
>>>>>>> 
>>>>>>> 
>>>>>>> Are those OSDs running on vm instances?
>>>>>>> 
>>>>>>> # 9.7
>>>>>>> <snip>
>>>>>>> 
>>>>>>> "state": "active+remapped",
>>>>>>> "snap_trimq": "[]",
>>>>>>> "epoch": 3114,
>>>>>>> "up": [
>>>>>>>    7,
>>>>>>>    3
>>>>>>> ],
>>>>>>> "acting": [
>>>>>>>    7,
>>>>>>>    3,
>>>>>>>    0
>>>>>>> ],
>>>>>>> 
>>>>>>> <snip>
>>>>>>> 
>>>>>>> # 7.84
>>>>>>> <snip>
>>>>>>> 
>>>>>>> "state": "active+remapped",
>>>>>>> "snap_trimq": "[]",
>>>>>>> "epoch": 3114,
>>>>>>> "up": [
>>>>>>>    4,
>>>>>>>    8
>>>>>>> ],
>>>>>>> "acting": [
>>>>>>>    4,
>>>>>>>    8,
>>>>>>>    1
>>>>>>> ],
>>>>>>> 
>>>>>>> <snip>
>>>>>>> 
>>>>>>> # 8.1b
>>>>>>> <snip>
>>>>>>> 
>>>>>>> "state": "active+remapped",
>>>>>>> "snap_trimq": "[]",
>>>>>>> "epoch": 3114,
>>>>>>> "up": [
>>>>>>>    4,
>>>>>>>    7
>>>>>>> ],
>>>>>>> "acting": [
>>>>>>>    4,
>>>>>>>    7,
>>>>>>>    2
>>>>>>> ],
>>>>>>> 
>>>>>>> <snip>
>>>>>>> 
>>>>>>> # 7.7a
>>>>>>> <snip>
>>>>>>> 
>>>>>>> "state": "active+remapped",
>>>>>>> "snap_trimq": "[]",
>>>>>>> "epoch": 3114,
>>>>>>> "up": [
>>>>>>>    7,
>>>>>>>    4
>>>>>>> ],
>>>>>>> "acting": [
>>>>>>>    7,
>>>>>>>    4,
>>>>>>>    2
>>>>>>> ],
>>>>>>> 
>>>>>>> <snip>
>>>>>>> 
>>>>>>> 
>>>>> 
>>> 
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] PGs stuck active+remapped and osds lose data?!

Reply via email to