I think I figured out! All 4 of the OSDs on one host (OSD 107-110) were sending massive amounts of auth requests to the monitors, seeming to overwhelm them.
Weird bit is that I removed them (osd crush remove, auth del, osd rm), dd the box and all of the disks, reinstalled and guess what? They are still doing a lot of requests to the MONs... this will require some further investigations. As this is happening during my holidays, I just disabled them, and will investigate further when I get back. On Fri, Jul 24, 2015 at 11:11 PM, Kjetil Jørgensen <[email protected]> wrote: > It sounds slightly similar to what I just experienced. > > I had one monitor out of three, which seemed to essentially run one core > at full tilt continuously, and had it's virtual address space allocated at > the point where top started calling it Tb. Requests hitting this monitor > did not get very timely responses (although; I don't know if this were > happening consistently or arbitrarily). > > I ended up re-building the monitor from the two healthy ones I had, which > made the problem go away for me. > > After the fact inspection of the monitor I ripped out, clocked it in at > 1.3Gb compared to the 250Mb of the other two, after rebuild they're all > comparable in size. > > In my case; this started out for me on firefly, and persisted after > upgrading to hammer. Which prompted the rebuild, suspecting that in my case > it were related to "something" persistent for this monitor. > > I do not have that much more useful to contribute to this discussion, > since I've more-or-less destroyed any evidence by re-building the monitor. > > Cheers, > KJ > > On Fri, Jul 24, 2015 at 1:55 PM, Luis Periquito <[email protected]> > wrote: > >> The leveldb is smallish: around 70mb. >> >> I ran debug mon = 10 for a while, but couldn't find any interesting >> information. I would run out of space quite quickly though as the log >> partition only has 10g. >> On 24 Jul 2015 21:13, "Mark Nelson" <[email protected]> wrote: >> >>> On 07/24/2015 02:31 PM, Luis Periquito wrote: >>> >>>> Now it's official, I have a weird one! >>>> >>>> Restarted one of the ceph-mons with jemalloc and it didn't make any >>>> difference. It's still using a lot of cpu and still not freeing up >>>> memory... >>>> >>>> The issue is that the cluster almost stops responding to requests, and >>>> if I restart the primary mon (that had almost no memory usage nor cpu) >>>> the cluster goes back to its merry way responding to requests. >>>> >>>> Does anyone have any idea what may be going on? The worst bit is that I >>>> have several clusters just like this (well they are smaller), and as we >>>> do everything with puppet, they should be very similar... and all the >>>> other clusters are just working fine, without any issues whatsoever... >>>> >>> >>> We've seen cases where leveldb can't compact fast enough and memory >>> balloons, but it's usually associated with extreme CPU usage as well. It >>> would be showing up in perf though if that were the case... >>> >>> >>>> On 24 Jul 2015 10:11, "Jan Schermer" <[email protected] >>>> <mailto:[email protected]>> wrote: >>>> >>>> You don’t (shouldn’t) need to rebuild the binary to use jemalloc. It >>>> should be possible to do something like >>>> >>>> LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.1 ceph-osd … >>>> >>>> The last time we tried it segfaulted after a few minutes, so YMMV >>>> and be careful. >>>> >>>> Jan >>>> >>>> On 23 Jul 2015, at 18:18, Luis Periquito <[email protected] >>>>> <mailto:[email protected]>> wrote: >>>>> >>>>> Hi Greg, >>>>> >>>>> I've been looking at the tcmalloc issues, but did seem to affect >>>>> osd's, and I do notice it in heavy read workloads (even after the >>>>> patch and >>>>> increasing TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=134217728). This >>>>> is affecting the mon process though. >>>>> >>>>> looking at perf top I'm getting most of the CPU usage in mutex >>>>> lock/unlock >>>>> 5.02% libpthread-2.19.so <http://libpthread-2.19.so/> [.] >>>>> pthread_mutex_unlock >>>>> 3.82% libsoftokn3.so [.] 0x000000000001e7cb >>>>> 3.46% libpthread-2.19.so <http://libpthread-2.19.so/> [.] >>>>> pthread_mutex_lock >>>>> >>>>> I could try to use jemalloc, are you aware of any built binaries? >>>>> Can I mix a cluster with different malloc binaries? >>>>> >>>>> >>>>> On Thu, Jul 23, 2015 at 10:50 AM, Gregory Farnum <[email protected] >>>>> <mailto:[email protected]>> wrote: >>>>> >>>>> On Thu, Jul 23, 2015 at 8:39 AM, Luis Periquito >>>>> <[email protected] <mailto:[email protected]>> wrote: >>>>> > The ceph-mon is already taking a lot of memory, and I ran a >>>>> heap stats >>>>> > ------------------------------------------------ >>>>> > MALLOC: 32391696 ( 30.9 MiB) Bytes in use by >>>>> application >>>>> > MALLOC: + 27597135872 (26318.7 MiB) Bytes in page heap >>>>> freelist >>>>> > MALLOC: + 16598552 ( 15.8 MiB) Bytes in central cache >>>>> freelist >>>>> > MALLOC: + 14693536 ( 14.0 MiB) Bytes in transfer cache >>>>> freelist >>>>> > MALLOC: + 17441592 ( 16.6 MiB) Bytes in thread cache >>>>> freelists >>>>> > MALLOC: + 116387992 ( 111.0 MiB) Bytes in malloc metadata >>>>> > MALLOC: ------------ >>>>> > MALLOC: = 27794649240 (26507.0 MiB) Actual memory used >>>>> (physical + swap) >>>>> > MALLOC: + 26116096 ( 24.9 MiB) Bytes released to OS >>>>> (aka unmapped) >>>>> > MALLOC: ------------ >>>>> > MALLOC: = 27820765336 (26531.9 MiB) Virtual address space >>>>> used >>>>> > MALLOC: >>>>> > MALLOC: 5683 Spans in use >>>>> > MALLOC: 21 Thread heaps in use >>>>> > MALLOC: 8192 Tcmalloc page size >>>>> > ------------------------------------------------ >>>>> > >>>>> > after that I ran the heap release and it went back to normal. >>>>> > ------------------------------------------------ >>>>> > MALLOC: 22919616 ( 21.9 MiB) Bytes in use by >>>>> application >>>>> > MALLOC: + 4792320 ( 4.6 MiB) Bytes in page heap >>>>> freelist >>>>> > MALLOC: + 18743448 ( 17.9 MiB) Bytes in central cache >>>>> freelist >>>>> > MALLOC: + 20645776 ( 19.7 MiB) Bytes in transfer cache >>>>> freelist >>>>> > MALLOC: + 18456088 ( 17.6 MiB) Bytes in thread cache >>>>> freelists >>>>> > MALLOC: + 116387992 ( 111.0 MiB) Bytes in malloc metadata >>>>> > MALLOC: ------------ >>>>> > MALLOC: = 201945240 ( 192.6 MiB) Actual memory used >>>>> (physical + swap) >>>>> > MALLOC: + 27618820096 <tel:%2B%20%2027618820096> (26339.4 >>>>> MiB) Bytes released to OS (aka unmapped) >>>>> > MALLOC: ------------ >>>>> > MALLOC: = 27820765336 (26531.9 MiB) Virtual address space >>>>> used >>>>> > MALLOC: >>>>> > MALLOC: 5639 Spans in use >>>>> > MALLOC: 29 Thread heaps in use >>>>> > MALLOC: 8192 Tcmalloc page size >>>>> > ------------------------------------------------ >>>>> > >>>>> > So it just seems the monitor is not returning unused memory >>>>> into the OS or >>>>> > reusing already allocated memory it deems as free... >>>>> >>>>> Yep. This is a bug (best we can tell) in some versions of >>>>> tcmalloc >>>>> combined with certain distribution stacks, although I don't >>>>> think >>>>> we've seen it reported on Trusty (nor on a tcmalloc >>>>> distribution that >>>>> new) before. Alternatively some folks are seeing tcmalloc use >>>>> up lots >>>>> of CPU in other scenarios involving memory return and it may >>>>> manifest >>>>> like this, but I'm not sure. You could look through the >>>>> mailing list >>>>> for information on it. >>>>> -Greg >>>>> >>>>> >>>>> _______________________________________________ >>>>> ceph-users mailing list >>>>> [email protected] <mailto:[email protected]> >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> ceph-users mailing list >>>> [email protected] >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>> >>>> _______________________________________________ >>> ceph-users mailing list >>> [email protected] >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >> >> _______________________________________________ >> ceph-users mailing list >> [email protected] >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> > > > -- > -- > Kjetil Joergensen <[email protected]> > Operations Engineer, Medallia Inc > Phone: +1 (650) 739-6580 > > _______________________________________________ > ceph-users mailing list > [email protected] > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
