On Sat, Feb 18, 2017 at 9:03 AM, Matyas Koszik <kos...@atw.hu> wrote:
>
>
> Looks like you've provided me with the solution, thanks!

:)

> I've set the tunables to firefly, and now I only see the normal states
> associated with a recovering cluster, there're no more stale pgs.
> I hope it'll stay like this when it's done, but that'll take quite a
> while.
>
> Matyas
>
>
> On Fri, 17 Feb 2017, Gregory Farnum wrote:
>
>> Situations that are stable lots of undersized PGs like this generally
>> mean that the CRUSH map is failing to allocate enough OSDs for certain
>> PGs. The log you have says the OSD is trying to NOTIFY the new primary
>> that the PG exists here on this replica.
>>
>> I'd guess you only have 3 hosts and are trying to place all your
>> replicas on independent boxes. Bobtail tunables have trouble with that
>> and you're going to need to pay the cost of moving to more modern
>> ones.
>> -Greg
>>
>> On Fri, Feb 17, 2017 at 5:30 AM, Matyas Koszik <kos...@atw.hu> wrote:
>> >
>> >
>> > I'm not sure what variable should I be looking at exactly, but after
>> > reading through all of them I don't see anyting supsicious, all values are
>> > 0. I'm attaching it anyway, in case I missed something:
>> > https://atw.hu/~koszik/ceph/osd26-perf
>> >
>> >
>> > I tried debugging the ceph pg query a bit more, and it seems that it
>> > gets stuck communicating with the mon - it doesn't even try to connect to
>> > the osd. This is the end of the log:
>> >
>> > 13:36:07.006224 sendmsg(3, {msg_name(0)=NULL, msg_iov(4)=[{"\7", 1}, 
>> > {"\6\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\17\0\177\0\2\0\27\0\0\0\0\0\0\0\0\0"...,
>> >  53}, {"\1\0\0\0\6\0\0\0osdmap9\4\1\0\0\0\0\0\1", 23}, 
>> > {"\255UC\211\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1", 21}], msg_controllen=0, 
>> > msg_flags=0}, MSG_NOSIGNAL) = 98
>> > 13:36:07.207010 recvfrom(3, "\10\6\0\0\0\0\0\0\0", 4096, MSG_DONTWAIT, 
>> > NULL, NULL) = 9
>> > 13:36:09.963843 sendmsg(3, {msg_name(0)=NULL, msg_iov(2)=[{"\16", 1}, 
>> > {"9\356\246X\245\330r9", 8}], msg_controllen=0, msg_flags=0}, 
>> > MSG_NOSIGNAL) = 9
>> > 13:36:09.964340 recvfrom(3, "\0179\356\246X\245\330r9", 4096, 
>> > MSG_DONTWAIT, NULL, NULL) = 9
>> > 13:36:19.964154 sendmsg(3, {msg_name(0)=NULL, msg_iov(2)=[{"\16", 1}, 
>> > {"C\356\246X\24\226w9", 8}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) 
>> > = 9
>> > 13:36:19.964573 recvfrom(3, "\17C\356\246X\24\226w9", 4096, MSG_DONTWAIT, 
>> > NULL, NULL) = 9
>> > 13:36:29.964439 sendmsg(3, {msg_name(0)=NULL, msg_iov(2)=[{"\16", 1}, 
>> > {"M\356\246X|\353{9", 8}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 
>> > 9
>> > 13:36:29.964938 recvfrom(3, "\17M\356\246X|\353{9", 4096, MSG_DONTWAIT, 
>> > NULL, NULL) = 9
>> >
>> > ... and this goes on for as long as I let it. When I kill it, I get this:
>> > RuntimeError: "None": exception "['{"prefix": "get_command_descriptions", 
>> > "pgid": "6.245"}']": exception 'int' object is not iterable
>> >
>> > I restarted (again) osd26 with max debugging; after grepping for 6.245,
>> > this is the log I get:
>> > https://atw.hu/~koszik/ceph/ceph-osd.26.log.6245
>> >
>> > Matyas
>> >
>> >
>> > On Fri, 17 Feb 2017, Tomasz Kuzemko wrote:
>> >
>> >> If the PG cannot be queried I would bet on OSD message throttler. Check 
>> >> with "ceph --admin-daemon PATH_TO_ADMIN_SOCK perf dump" on each OSD which 
>> >> is holding this PG  if message throttler current value is not equal max. 
>> >> If it is, increase the max value in ceph.conf and restart OSD.
>> >>
>> >> --
>> >> Tomasz Kuzemko
>> >> tomasz.kuze...@corp.ovh.com
>> >>
>> >> Dnia 17.02.2017 o godz. 01:59 Matyas Koszik <kos...@atw.hu> napisaĹ (a):
>> >>
>> >> >
>> >> > Hi,
>> >> >
>> >> > It seems that my ceph cluster is in an erroneous state of which I cannot
>> >> > see right now how to get out of.
>> >> >
>> >> > The status is the following:
>> >> >
>> >> > health HEALTH_WARN
>> >> >       25 pgs degraded
>> >> >       1 pgs stale
>> >> >       26 pgs stuck unclean
>> >> >       25 pgs undersized
>> >> >       recovery 23578/9450442 objects degraded (0.249%)
>> >> >       recovery 45/9450442 objects misplaced (0.000%)
>> >> >       crush map has legacy tunables (require bobtail, min is firefly)
>> >> > monmap e17: 3 mons at x
>> >> >       election epoch 8550, quorum 0,1,2 store1,store3,store2
>> >> > osdmap e66602: 68 osds: 68 up, 68 in; 1 remapped pgs
>> >> >       flags require_jewel_osds
>> >> > pgmap v31433805: 4388 pgs, 8 pools, 18329 GB data, 4614 kobjects
>> >> >       36750 GB used, 61947 GB / 98697 GB avail
>> >> >       23578/9450442 objects degraded (0.249%)
>> >> >       45/9450442 objects misplaced (0.000%)
>> >> >           4362 active+clean
>> >> >             24 active+undersized+degraded
>> >> >              1 stale+active+undersized+degraded+remapped
>> >> >              1 active+remapped
>> >> >
>> >> >
>> >> > I tried restarting all OSDs, to no avail, it actually made things a bit
>> >> > worse.
>> >> > From a user point of view the cluster works perfectly (apart from that
>> >> > stale pg, which fortunately hit the pool on which I keep swap images
>> >> > only).
>> >> >
>> >> > A little background: I made the mistake of creating the cluster with
>> >> > size=2 pools, which I'm now in the process of rectifying, but that
>> >> > requires some fiddling around. I also tried moving to more optimal
>> >> > tunables (firefly), but the documentation is a bit optimistic
>> >> > with the 'up to 10%' data movement - it was over 50% in my case, so I
>> >> > reverted to bobtail immediately after I saw that number. I then started
>> >> > reweighing the osds in anticipation of the size=3 bump, and I think 
>> >> > that's
>> >> > when this bug hit me.
>> >> >
>> >> > Right now I have a pg (6.245) that cannot even be queried - the command
>> >> > times out, or gives this output: https://atw.hu/~koszik/ceph/pg6.245
>> >> >
>> >> > I queried a few other pgs that are acting up, but cannot see anything
>> >> > suspicious, other than the fact they do not have a working peer:
>> >> > https://atw.hu/~koszik/ceph/pg4.2ca
>> >> > https://atw.hu/~koszik/ceph/pg4.2e4
>> >> >
>> >> > Health details can be found here: https://atw.hu/~koszik/ceph/health
>> >> > OSD tree: https://atw.hu/~koszik/ceph/tree (here the weight sum of
>> >> > ssd/store3_ssd seems to be off, but that has been the case for quite 
>> >> > some
>> >> > time - not sure if it's related to any of this)
>> >> >
>> >> >
>> >> > I tried setting debugging to 20/20 on some of the affected osds, but 
>> >> > there
>> >> > was nothing there that gave me any ideas on solving this. How should I
>> >> > continue debugging this issue?
>> >> >
>> >> > BTW, I'm runnig 10.2.5 on all of my osd/mon nodes.
>> >> >
>> >> > Thanks,
>> >> > Matyas
>> >> >
>> >> >
>> >> > _______________________________________________
>> >> > ceph-users mailing list
>> >> > ceph-users@lists.ceph.com
>> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >>
>> >
>> > _______________________________________________
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to