Can you do? * ceph osd getcrushmap -o ./crushmap.o; crushtool -d ./crushmap.o -o ./crushmap.txt
On Sat, Feb 18, 2017 at 3:52 AM, Gregory Farnum <[email protected]> wrote: > Situations that are stable lots of undersized PGs like this generally > mean that the CRUSH map is failing to allocate enough OSDs for certain > PGs. The log you have says the OSD is trying to NOTIFY the new primary > that the PG exists here on this replica. > > I'd guess you only have 3 hosts and are trying to place all your > replicas on independent boxes. Bobtail tunables have trouble with that > and you're going to need to pay the cost of moving to more modern > ones. > -Greg > > On Fri, Feb 17, 2017 at 5:30 AM, Matyas Koszik <[email protected]> wrote: >> >> >> I'm not sure what variable should I be looking at exactly, but after >> reading through all of them I don't see anyting supsicious, all values are >> 0. I'm attaching it anyway, in case I missed something: >> https://atw.hu/~koszik/ceph/osd26-perf >> >> >> I tried debugging the ceph pg query a bit more, and it seems that it >> gets stuck communicating with the mon - it doesn't even try to connect to >> the osd. This is the end of the log: >> >> 13:36:07.006224 sendmsg(3, {msg_name(0)=NULL, msg_iov(4)=[{"\7", 1}, >> {"\6\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\17\0\177\0\2\0\27\0\0\0\0\0\0\0\0\0"..., >> 53}, {"\1\0\0\0\6\0\0\0osdmap9\4\1\0\0\0\0\0\1", 23}, >> {"\255UC\211\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1", 21}], msg_controllen=0, >> msg_flags=0}, MSG_NOSIGNAL) = 98 >> 13:36:07.207010 recvfrom(3, "\10\6\0\0\0\0\0\0\0", 4096, MSG_DONTWAIT, NULL, >> NULL) = 9 >> 13:36:09.963843 sendmsg(3, {msg_name(0)=NULL, msg_iov(2)=[{"\16", 1}, >> {"9\356\246X\245\330r9", 8}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) >> = 9 >> 13:36:09.964340 recvfrom(3, "\0179\356\246X\245\330r9", 4096, MSG_DONTWAIT, >> NULL, NULL) = 9 >> 13:36:19.964154 sendmsg(3, {msg_name(0)=NULL, msg_iov(2)=[{"\16", 1}, >> {"C\356\246X\24\226w9", 8}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = >> 9 >> 13:36:19.964573 recvfrom(3, "\17C\356\246X\24\226w9", 4096, MSG_DONTWAIT, >> NULL, NULL) = 9 >> 13:36:29.964439 sendmsg(3, {msg_name(0)=NULL, msg_iov(2)=[{"\16", 1}, >> {"M\356\246X|\353{9", 8}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 9 >> 13:36:29.964938 recvfrom(3, "\17M\356\246X|\353{9", 4096, MSG_DONTWAIT, >> NULL, NULL) = 9 >> >> ... and this goes on for as long as I let it. When I kill it, I get this: >> RuntimeError: "None": exception "['{"prefix": "get_command_descriptions", >> "pgid": "6.245"}']": exception 'int' object is not iterable >> >> I restarted (again) osd26 with max debugging; after grepping for 6.245, >> this is the log I get: >> https://atw.hu/~koszik/ceph/ceph-osd.26.log.6245 >> >> Matyas >> >> >> On Fri, 17 Feb 2017, Tomasz Kuzemko wrote: >> >>> If the PG cannot be queried I would bet on OSD message throttler. Check >>> with "ceph --admin-daemon PATH_TO_ADMIN_SOCK perf dump" on each OSD which >>> is holding this PG if message throttler current value is not equal max. If >>> it is, increase the max value in ceph.conf and restart OSD. >>> >>> -- >>> Tomasz Kuzemko >>> [email protected] >>> >>> Dnia 17.02.2017 o godz. 01:59 Matyas Koszik <[email protected]> napisaĆ(a): >>> >>> > >>> > Hi, >>> > >>> > It seems that my ceph cluster is in an erroneous state of which I cannot >>> > see right now how to get out of. >>> > >>> > The status is the following: >>> > >>> > health HEALTH_WARN >>> > 25 pgs degraded >>> > 1 pgs stale >>> > 26 pgs stuck unclean >>> > 25 pgs undersized >>> > recovery 23578/9450442 objects degraded (0.249%) >>> > recovery 45/9450442 objects misplaced (0.000%) >>> > crush map has legacy tunables (require bobtail, min is firefly) >>> > monmap e17: 3 mons at x >>> > election epoch 8550, quorum 0,1,2 store1,store3,store2 >>> > osdmap e66602: 68 osds: 68 up, 68 in; 1 remapped pgs >>> > flags require_jewel_osds >>> > pgmap v31433805: 4388 pgs, 8 pools, 18329 GB data, 4614 kobjects >>> > 36750 GB used, 61947 GB / 98697 GB avail >>> > 23578/9450442 objects degraded (0.249%) >>> > 45/9450442 objects misplaced (0.000%) >>> > 4362 active+clean >>> > 24 active+undersized+degraded >>> > 1 stale+active+undersized+degraded+remapped >>> > 1 active+remapped >>> > >>> > >>> > I tried restarting all OSDs, to no avail, it actually made things a bit >>> > worse. >>> > From a user point of view the cluster works perfectly (apart from that >>> > stale pg, which fortunately hit the pool on which I keep swap images >>> > only). >>> > >>> > A little background: I made the mistake of creating the cluster with >>> > size=2 pools, which I'm now in the process of rectifying, but that >>> > requires some fiddling around. I also tried moving to more optimal >>> > tunables (firefly), but the documentation is a bit optimistic >>> > with the 'up to 10%' data movement - it was over 50% in my case, so I >>> > reverted to bobtail immediately after I saw that number. I then started >>> > reweighing the osds in anticipation of the size=3 bump, and I think that's >>> > when this bug hit me. >>> > >>> > Right now I have a pg (6.245) that cannot even be queried - the command >>> > times out, or gives this output: https://atw.hu/~koszik/ceph/pg6.245 >>> > >>> > I queried a few other pgs that are acting up, but cannot see anything >>> > suspicious, other than the fact they do not have a working peer: >>> > https://atw.hu/~koszik/ceph/pg4.2ca >>> > https://atw.hu/~koszik/ceph/pg4.2e4 >>> > >>> > Health details can be found here: https://atw.hu/~koszik/ceph/health >>> > OSD tree: https://atw.hu/~koszik/ceph/tree (here the weight sum of >>> > ssd/store3_ssd seems to be off, but that has been the case for quite some >>> > time - not sure if it's related to any of this) >>> > >>> > >>> > I tried setting debugging to 20/20 on some of the affected osds, but there >>> > was nothing there that gave me any ideas on solving this. How should I >>> > continue debugging this issue? >>> > >>> > BTW, I'm runnig 10.2.5 on all of my osd/mon nodes. >>> > >>> > Thanks, >>> > Matyas >>> > >>> > >>> > _______________________________________________ >>> > ceph-users mailing list >>> > [email protected] >>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >> >> _______________________________________________ >> ceph-users mailing list >> [email protected] >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ > ceph-users mailing list > [email protected] > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
