I think I figured out! All 4 of the OSDs on one host (OSD 107-110) were
sending massive amounts of auth requests to the monitors, seeming to
overwhelm them.

Weird bit is that I removed them (osd crush remove, auth del, osd rm), dd
the box and all of the disks, reinstalled and guess what? They are still
doing a lot of requests to the MONs... this will require some further
investigations.

As this is happening during my holidays, I just disabled them, and will
investigate further when I get back.


On Fri, Jul 24, 2015 at 11:11 PM, Kjetil Jørgensen <[email protected]>
wrote:

> It sounds slightly similar to what I just experienced.
>
> I had one monitor out of three, which seemed to essentially run one core
> at full tilt continuously, and had it's virtual address space allocated at
> the point where top started calling it Tb. Requests hitting this monitor
> did not get very timely responses (although; I don't know if this were
> happening consistently or arbitrarily).
>
> I ended up re-building the monitor from the two healthy ones I had, which
> made the problem go away for me.
>
> After the fact inspection of the monitor I ripped out, clocked it in at
> 1.3Gb compared to the 250Mb of the other two, after rebuild they're all
> comparable in size.
>
> In my case; this started out for me on firefly, and persisted after
> upgrading to hammer. Which prompted the rebuild, suspecting that in my case
> it were related to "something" persistent for this monitor.
>
> I do not have that much more useful to contribute to this discussion,
> since I've more-or-less destroyed any evidence by re-building the monitor.
>
> Cheers,
> KJ
>
> On Fri, Jul 24, 2015 at 1:55 PM, Luis Periquito <[email protected]>
> wrote:
>
>> The leveldb is smallish: around 70mb.
>>
>> I ran debug mon = 10 for a while,  but couldn't find any interesting
>> information. I would run out of space quite quickly though as the log
>> partition only has 10g.
>> On 24 Jul 2015 21:13, "Mark Nelson" <[email protected]> wrote:
>>
>>> On 07/24/2015 02:31 PM, Luis Periquito wrote:
>>>
>>>> Now it's official,  I have a weird one!
>>>>
>>>> Restarted one of the ceph-mons with jemalloc and it didn't make any
>>>> difference. It's still using a lot of cpu and still not freeing up
>>>> memory...
>>>>
>>>> The issue is that the cluster almost stops responding to requests, and
>>>> if I restart the primary mon (that had almost no memory usage nor cpu)
>>>> the cluster goes back to its merry way responding to requests.
>>>>
>>>> Does anyone have any idea what may be going on? The worst bit is that I
>>>> have several clusters just like this (well they are smaller), and as we
>>>> do everything with puppet, they should be very similar... and all the
>>>> other clusters are just working fine, without any issues whatsoever...
>>>>
>>>
>>> We've seen cases where leveldb can't compact fast enough and memory
>>> balloons, but it's usually associated with extreme CPU usage as well. It
>>> would be showing up in perf though if that were the case...
>>>
>>>
>>>> On 24 Jul 2015 10:11, "Jan Schermer" <[email protected]
>>>> <mailto:[email protected]>> wrote:
>>>>
>>>>     You don’t (shouldn’t) need to rebuild the binary to use jemalloc. It
>>>>     should be possible to do something like
>>>>
>>>>     LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.1 ceph-osd …
>>>>
>>>>     The last time we tried it segfaulted after a few minutes, so YMMV
>>>>     and be careful.
>>>>
>>>>     Jan
>>>>
>>>>      On 23 Jul 2015, at 18:18, Luis Periquito <[email protected]
>>>>>     <mailto:[email protected]>> wrote:
>>>>>
>>>>>     Hi Greg,
>>>>>
>>>>>     I've been looking at the tcmalloc issues, but did seem to affect
>>>>>     osd's, and I do notice it in heavy read workloads (even after the
>>>>>     patch and
>>>>>     increasing TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=134217728). This
>>>>>     is affecting the mon process though.
>>>>>
>>>>>     looking at perf top I'm getting most of the CPU usage in mutex
>>>>>     lock/unlock
>>>>>       5.02% libpthread-2.19.so <http://libpthread-2.19.so/>    [.]
>>>>>     pthread_mutex_unlock
>>>>>       3.82%  libsoftokn3.so        [.] 0x000000000001e7cb
>>>>>       3.46% libpthread-2.19.so <http://libpthread-2.19.so/>    [.]
>>>>>     pthread_mutex_lock
>>>>>
>>>>>     I could try to use jemalloc, are you aware of any built binaries?
>>>>>     Can I mix a cluster with different malloc binaries?
>>>>>
>>>>>
>>>>>     On Thu, Jul 23, 2015 at 10:50 AM, Gregory Farnum <[email protected]
>>>>>     <mailto:[email protected]>> wrote:
>>>>>
>>>>>         On Thu, Jul 23, 2015 at 8:39 AM, Luis Periquito
>>>>>         <[email protected] <mailto:[email protected]>> wrote:
>>>>>         > The ceph-mon is already taking a lot of memory, and I ran a
>>>>>         heap stats
>>>>>         > ------------------------------------------------
>>>>>         > MALLOC:       32391696 (   30.9 MiB) Bytes in use by
>>>>> application
>>>>>         > MALLOC: +  27597135872 (26318.7 MiB) Bytes in page heap
>>>>> freelist
>>>>>         > MALLOC: +     16598552 (   15.8 MiB) Bytes in central cache
>>>>>         freelist
>>>>>         > MALLOC: +     14693536 (   14.0 MiB) Bytes in transfer cache
>>>>>         freelist
>>>>>         > MALLOC: +     17441592 (   16.6 MiB) Bytes in thread cache
>>>>>         freelists
>>>>>         > MALLOC: +    116387992 (  111.0 MiB) Bytes in malloc metadata
>>>>>         > MALLOC:   ------------
>>>>>         > MALLOC: =  27794649240 (26507.0 MiB) Actual memory used
>>>>>         (physical + swap)
>>>>>         > MALLOC: +     26116096 (   24.9 MiB) Bytes released to OS
>>>>>         (aka unmapped)
>>>>>         > MALLOC:   ------------
>>>>>         > MALLOC: =  27820765336 (26531.9 MiB) Virtual address space
>>>>> used
>>>>>         > MALLOC:
>>>>>         > MALLOC:           5683              Spans in use
>>>>>         > MALLOC:             21              Thread heaps in use
>>>>>         > MALLOC:           8192              Tcmalloc page size
>>>>>         > ------------------------------------------------
>>>>>         >
>>>>>         > after that I ran the heap release and it went back to normal.
>>>>>         > ------------------------------------------------
>>>>>         > MALLOC:       22919616 (   21.9 MiB) Bytes in use by
>>>>> application
>>>>>         > MALLOC: +      4792320 (    4.6 MiB) Bytes in page heap
>>>>> freelist
>>>>>         > MALLOC: +     18743448 (   17.9 MiB) Bytes in central cache
>>>>>         freelist
>>>>>         > MALLOC: +     20645776 (   19.7 MiB) Bytes in transfer cache
>>>>>         freelist
>>>>>         > MALLOC: +     18456088 (   17.6 MiB) Bytes in thread cache
>>>>>         freelists
>>>>>         > MALLOC: +    116387992 (  111.0 MiB) Bytes in malloc metadata
>>>>>         > MALLOC:   ------------
>>>>>         > MALLOC: =    201945240 (  192.6 MiB) Actual memory used
>>>>>         (physical + swap)
>>>>>         > MALLOC: + 27618820096 <tel:%2B%20%2027618820096> (26339.4
>>>>>         MiB) Bytes released to OS (aka unmapped)
>>>>>         > MALLOC:   ------------
>>>>>         > MALLOC: =  27820765336 (26531.9 MiB) Virtual address space
>>>>> used
>>>>>         > MALLOC:
>>>>>         > MALLOC:           5639              Spans in use
>>>>>         > MALLOC:             29              Thread heaps in use
>>>>>         > MALLOC:           8192              Tcmalloc page size
>>>>>         > ------------------------------------------------
>>>>>         >
>>>>>         > So it just seems the monitor is not returning unused memory
>>>>> into the OS or
>>>>>         > reusing already allocated memory it deems as free...
>>>>>
>>>>>         Yep. This is a bug (best we can tell) in some versions of
>>>>> tcmalloc
>>>>>         combined with certain distribution stacks, although I don't
>>>>> think
>>>>>         we've seen it reported on Trusty (nor on a tcmalloc
>>>>>         distribution that
>>>>>         new) before. Alternatively some folks are seeing tcmalloc use
>>>>>         up lots
>>>>>         of CPU in other scenarios involving memory return and it may
>>>>>         manifest
>>>>>         like this, but I'm not sure. You could look through the
>>>>>         mailing list
>>>>>         for information on it.
>>>>>         -Greg
>>>>>
>>>>>
>>>>>     _______________________________________________
>>>>>     ceph-users mailing list
>>>>>     [email protected] <mailto:[email protected]>
>>>>>     http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> [email protected]
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>>>  _______________________________________________
>>> ceph-users mailing list
>>> [email protected]
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> [email protected]
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>
>
> --
> --
> Kjetil Joergensen <[email protected]>
> Operations Engineer, Medallia Inc
> Phone: +1 (650) 739-6580
>
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to