I just had my ceph cluster exhibit this behavior (two of three mons eat all
CPU, cluster becomes unusably slow) which is running 0.87.1

It seems to be tied to deep scrubbing, as the behavior almost immediately
surfaces if that is turned on, but if it is off the behavior eventually
seems to return to normal and stays that way while scrubbing is off. I have
not yet found anything in the cluster to indicate a hardware problem.

Any thoughts or further insights on this subject would be appreciated.

QH

On Sat, Jul 25, 2015 at 12:31 AM, Luis Periquito <periqu...@gmail.com>
wrote:

> I think I figured out! All 4 of the OSDs on one host (OSD 107-110) were
> sending massive amounts of auth requests to the monitors, seeming to
> overwhelm them.
>
> Weird bit is that I removed them (osd crush remove, auth del, osd rm), dd
> the box and all of the disks, reinstalled and guess what? They are still
> doing a lot of requests to the MONs... this will require some further
> investigations.
>
> As this is happening during my holidays, I just disabled them, and will
> investigate further when I get back.
>
>
> On Fri, Jul 24, 2015 at 11:11 PM, Kjetil Jørgensen <kje...@medallia.com>
> wrote:
>
>> It sounds slightly similar to what I just experienced.
>>
>> I had one monitor out of three, which seemed to essentially run one core
>> at full tilt continuously, and had it's virtual address space allocated at
>> the point where top started calling it Tb. Requests hitting this monitor
>> did not get very timely responses (although; I don't know if this were
>> happening consistently or arbitrarily).
>>
>> I ended up re-building the monitor from the two healthy ones I had, which
>> made the problem go away for me.
>>
>> After the fact inspection of the monitor I ripped out, clocked it in at
>> 1.3Gb compared to the 250Mb of the other two, after rebuild they're all
>> comparable in size.
>>
>> In my case; this started out for me on firefly, and persisted after
>> upgrading to hammer. Which prompted the rebuild, suspecting that in my case
>> it were related to "something" persistent for this monitor.
>>
>> I do not have that much more useful to contribute to this discussion,
>> since I've more-or-less destroyed any evidence by re-building the monitor.
>>
>> Cheers,
>> KJ
>>
>> On Fri, Jul 24, 2015 at 1:55 PM, Luis Periquito <periqu...@gmail.com>
>> wrote:
>>
>>> The leveldb is smallish: around 70mb.
>>>
>>> I ran debug mon = 10 for a while,  but couldn't find any interesting
>>> information. I would run out of space quite quickly though as the log
>>> partition only has 10g.
>>> On 24 Jul 2015 21:13, "Mark Nelson" <mnel...@redhat.com> wrote:
>>>
>>>> On 07/24/2015 02:31 PM, Luis Periquito wrote:
>>>>
>>>>> Now it's official,  I have a weird one!
>>>>>
>>>>> Restarted one of the ceph-mons with jemalloc and it didn't make any
>>>>> difference. It's still using a lot of cpu and still not freeing up
>>>>> memory...
>>>>>
>>>>> The issue is that the cluster almost stops responding to requests, and
>>>>> if I restart the primary mon (that had almost no memory usage nor cpu)
>>>>> the cluster goes back to its merry way responding to requests.
>>>>>
>>>>> Does anyone have any idea what may be going on? The worst bit is that I
>>>>> have several clusters just like this (well they are smaller), and as we
>>>>> do everything with puppet, they should be very similar... and all the
>>>>> other clusters are just working fine, without any issues whatsoever...
>>>>>
>>>>
>>>> We've seen cases where leveldb can't compact fast enough and memory
>>>> balloons, but it's usually associated with extreme CPU usage as well. It
>>>> would be showing up in perf though if that were the case...
>>>>
>>>>
>>>>> On 24 Jul 2015 10:11, "Jan Schermer" <j...@schermer.cz
>>>>> <mailto:j...@schermer.cz>> wrote:
>>>>>
>>>>>     You don’t (shouldn’t) need to rebuild the binary to use jemalloc.
>>>>> It
>>>>>     should be possible to do something like
>>>>>
>>>>>     LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.1 ceph-osd …
>>>>>
>>>>>     The last time we tried it segfaulted after a few minutes, so YMMV
>>>>>     and be careful.
>>>>>
>>>>>     Jan
>>>>>
>>>>>      On 23 Jul 2015, at 18:18, Luis Periquito <periqu...@gmail.com
>>>>>>     <mailto:periqu...@gmail.com>> wrote:
>>>>>>
>>>>>>     Hi Greg,
>>>>>>
>>>>>>     I've been looking at the tcmalloc issues, but did seem to affect
>>>>>>     osd's, and I do notice it in heavy read workloads (even after the
>>>>>>     patch and
>>>>>>     increasing TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=134217728). This
>>>>>>     is affecting the mon process though.
>>>>>>
>>>>>>     looking at perf top I'm getting most of the CPU usage in mutex
>>>>>>     lock/unlock
>>>>>>       5.02% libpthread-2.19.so <http://libpthread-2.19.so/>    [.]
>>>>>>     pthread_mutex_unlock
>>>>>>       3.82%  libsoftokn3.so        [.] 0x000000000001e7cb
>>>>>>       3.46% libpthread-2.19.so <http://libpthread-2.19.so/>    [.]
>>>>>>     pthread_mutex_lock
>>>>>>
>>>>>>     I could try to use jemalloc, are you aware of any built binaries?
>>>>>>     Can I mix a cluster with different malloc binaries?
>>>>>>
>>>>>>
>>>>>>     On Thu, Jul 23, 2015 at 10:50 AM, Gregory Farnum <
>>>>>> g...@gregs42.com
>>>>>>     <mailto:g...@gregs42.com>> wrote:
>>>>>>
>>>>>>         On Thu, Jul 23, 2015 at 8:39 AM, Luis Periquito
>>>>>>         <periqu...@gmail.com <mailto:periqu...@gmail.com>> wrote:
>>>>>>         > The ceph-mon is already taking a lot of memory, and I ran a
>>>>>>         heap stats
>>>>>>         > ------------------------------------------------
>>>>>>         > MALLOC:       32391696 (   30.9 MiB) Bytes in use by
>>>>>> application
>>>>>>         > MALLOC: +  27597135872 (26318.7 MiB) Bytes in page heap
>>>>>> freelist
>>>>>>         > MALLOC: +     16598552 (   15.8 MiB) Bytes in central cache
>>>>>>         freelist
>>>>>>         > MALLOC: +     14693536 (   14.0 MiB) Bytes in transfer cache
>>>>>>         freelist
>>>>>>         > MALLOC: +     17441592 (   16.6 MiB) Bytes in thread cache
>>>>>>         freelists
>>>>>>         > MALLOC: +    116387992 (  111.0 MiB) Bytes in malloc
>>>>>> metadata
>>>>>>         > MALLOC:   ------------
>>>>>>         > MALLOC: =  27794649240 (26507.0 MiB) Actual memory used
>>>>>>         (physical + swap)
>>>>>>         > MALLOC: +     26116096 (   24.9 MiB) Bytes released to OS
>>>>>>         (aka unmapped)
>>>>>>         > MALLOC:   ------------
>>>>>>         > MALLOC: =  27820765336 (26531.9 MiB) Virtual address space
>>>>>> used
>>>>>>         > MALLOC:
>>>>>>         > MALLOC:           5683              Spans in use
>>>>>>         > MALLOC:             21              Thread heaps in use
>>>>>>         > MALLOC:           8192              Tcmalloc page size
>>>>>>         > ------------------------------------------------
>>>>>>         >
>>>>>>         > after that I ran the heap release and it went back to
>>>>>> normal.
>>>>>>         > ------------------------------------------------
>>>>>>         > MALLOC:       22919616 (   21.9 MiB) Bytes in use by
>>>>>> application
>>>>>>         > MALLOC: +      4792320 (    4.6 MiB) Bytes in page heap
>>>>>> freelist
>>>>>>         > MALLOC: +     18743448 (   17.9 MiB) Bytes in central cache
>>>>>>         freelist
>>>>>>         > MALLOC: +     20645776 (   19.7 MiB) Bytes in transfer cache
>>>>>>         freelist
>>>>>>         > MALLOC: +     18456088 (   17.6 MiB) Bytes in thread cache
>>>>>>         freelists
>>>>>>         > MALLOC: +    116387992 (  111.0 MiB) Bytes in malloc
>>>>>> metadata
>>>>>>         > MALLOC:   ------------
>>>>>>         > MALLOC: =    201945240 (  192.6 MiB) Actual memory used
>>>>>>         (physical + swap)
>>>>>>         > MALLOC: + 27618820096 <tel:%2B%20%2027618820096> (26339.4
>>>>>>         MiB) Bytes released to OS (aka unmapped)
>>>>>>         > MALLOC:   ------------
>>>>>>         > MALLOC: =  27820765336 (26531.9 MiB) Virtual address space
>>>>>> used
>>>>>>         > MALLOC:
>>>>>>         > MALLOC:           5639              Spans in use
>>>>>>         > MALLOC:             29              Thread heaps in use
>>>>>>         > MALLOC:           8192              Tcmalloc page size
>>>>>>         > ------------------------------------------------
>>>>>>         >
>>>>>>         > So it just seems the monitor is not returning unused memory
>>>>>> into the OS or
>>>>>>         > reusing already allocated memory it deems as free...
>>>>>>
>>>>>>         Yep. This is a bug (best we can tell) in some versions of
>>>>>> tcmalloc
>>>>>>         combined with certain distribution stacks, although I don't
>>>>>> think
>>>>>>         we've seen it reported on Trusty (nor on a tcmalloc
>>>>>>         distribution that
>>>>>>         new) before. Alternatively some folks are seeing tcmalloc use
>>>>>>         up lots
>>>>>>         of CPU in other scenarios involving memory return and it may
>>>>>>         manifest
>>>>>>         like this, but I'm not sure. You could look through the
>>>>>>         mailing list
>>>>>>         for information on it.
>>>>>>         -Greg
>>>>>>
>>>>>>
>>>>>>     _______________________________________________
>>>>>>     ceph-users mailing list
>>>>>>     ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>>>>>>     http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> ceph-users mailing list
>>>>> ceph-users@lists.ceph.com
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>
>>>>>  _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@lists.ceph.com
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>
>>
>> --
>> --
>> Kjetil Joergensen <kje...@medallia.com>
>> Operations Engineer, Medallia Inc
>> Phone: +1 (650) 739-6580
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to