Re: Enabling large-page-allocations

Mike Lambert Thu, 16 Jul 2009 03:15:57 -0700

Incidentally, why was "mallinfo" removed from memcache 1.4.0? Even
without it being 64-bit aware, it still provided some useful data that
I wasn't able to get via other means in our 1.2.6 binaries.


Mike

On Thu, Jul 16, 2009 at 03:13, Mike Lambert<[email protected]> wrote:
> Basically process memory was growing very slowly over time to
> eventually cause machine swapping. It was leveling out (not a leak),
> but at a level higher than we expected, even with hashtable and
> maxbytes accounted for. So I was poking around at memory usage, and
> decided that fragmentation was to blame.
>
>
> Looking again right now at a machine configured with -m 6000 (so
> ~6gb), I see "stats maps" showing a 512mb hashtable and 7.5gb heap.
>
> "stats malloc" (which isn't 64-bit aware) gives:
> STAT mmapped_space 564604928   # this has the 512mb hashtable
> STAT arena_size -1058820096
> STAT total_alloc -2040194320
> STAT total_free 981374224
> where arena_size = total_alloc+total_free.
>
> Knowing that the total size of the heap is 7.5gb, I can derive that
> real_arena_size = -1058820096 + 2**32 * 3 = 7531114496. Doing
> total_free/real_arena_size gives 13%, which is my estimate for
> free-but-unallocated ram. (Free due to fragmentation or
> not-yet-allocation is hard to tell, but that number is still very
> high.)
>
> Alternately, one could ask why we have a 7.5gb heap for a 6gb
> memcache...why so much ram? I calculated 100mb-200mb for 7600
> connections plus some various free lists, but I was running into the
> problem that total_free indicates there are still 981mb of unallocated
> ram in the heap. So I think at the time I concluded this was due to
> fragmentation.
>
>
> We solved our problem by reducing the amount of ram we gave to
> memcache so we didn't swap, but in theory getting an extra 10-13% of
> RAM out of our memcaches sounds like a great idea. And so given my
> fragmentation conclusion, I was looking for ways to reduce that.
>
>
> Thoughts? Is there perhaps another explanation for the data above?
>
> Thanks,
> Mike
>
> On Wed, Jul 15, 2009 at 19:40, Matt Ingenthron<[email protected]> wrote:
>>
>> Hi Mike,
>>
>> Mike Lambert wrote:
>>>
>>> Trond, any thoughts?
>>>
>>
>> Trond is actually on vacation, but I did steal a few cycles of his time and
>> asked about this.
>>>
>>> I'd like to double-check that there isn't a reason we can't support
>>> preallocation without getpagesizes() before attempting to manually
>>> patch memcache and play with our production system here.
>>>
>>
>> There's no reason you can't do that.  There may be a slightly cleaner
>> integration approach Trond and I talked through.  I'll try to code that up
>> here in the next few days... but for now you may try your approach to see if
>> it helps alleviate the issue you were seeing.
>>
>> Incidentially, how did the memory fragmentation manifest itself on your
>> system?  I mean, could you see any effect on apps running on the system?
>>
>>
>>> Thanks,
>>> Mike
>>>
>>> On Jul 13, 8:38 pm, Mike Lambert <[email protected]> wrote:
>>>
>>>>
>>>> On Jul 10, 1:37 pm, Matt Ingenthron <[email protected]> wrote:
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>>
>>>>> Mike Lambert wrote:
>>>>>
>>>>>>
>>>>>> Currently the -L flag is only enabled if
>>>>>> HAVE_GETPAGESIZES&&HAVE_MEMCNTL. I'm curious what the motivation is
>>>>>> for something like that? In our experience, for some memcache pools we
>>>>>> end up fragmenting memory due to the repeated allocation of 1MB slabs
>>>>>> around all the other hashtables and free lists going on. We know we
>>>>>> want to allocate all memory upfront, but can't seem to do that on a
>>>>>> Linux system.
>>>>>>
>>>>>
>>>>> The primary motivation was more about not beating up the TLB cache on
>>>>> the CPU when running with large heaps.  There are users with large heaps
>>>>> already, so this should help if the underlying OS supports large pages.
>>>>>  TLB cache sizes are getting bigger in CPUs, but virtualization is more
>>>>> common and memory heaps are growing faster.
>>>>>      I'd like to have some empirical data on how big a difference the -L
>>>>> flag
>>>>> makes, but that assumes a workload profile.  I should be able to hack
>>>>> one up and do this with memcachetest, but I've just not done it yet.  :)
>>>>>
>>>>>>
>>>>>> To put it more concretely, here is a proposed change to make -L do a
>>>>>> contiguous preallocation even on machines without getpagesizes tuning.
>>>>>> My memcached server doesn't seem to crash, but I'm not sure if that's
>>>>>> a proper litmus test. What are the pros/cons of doing something like
>>>>>> this?
>>>>>>
>>>>>
>>>>> This feels more related to the -k flag, and that it should be using
>>>>> madvise() in there somewhere too.  It wouldn't be a bad idea to separate
>>>>> these necessarily.   I don't know that the day after 1.4.0 is the day to
>>>>> redefine -L though, but it's not necessarily bad. We should wait for
>>>>> Trond's repsonse to see what he thinks about this since he implemented
>>>>> it.  :)
>>>>>
>>>>
>>>> Haha, yeah, the release of 1.4.0 reminded me I wanted to send this
>>>> email. Sorry for the bad timing.
>>>>
>>>> -k keeps the memory from getting paged out to disk (which is a very
>>>> goodt hing in our case.)
>>>> -L appears to me (who isn't aware of what getpagesizes does) to be
>>>> related to preallocation with big allocations, which I thought was
>>>> what I wanted.
>>>>
>>>> If you want, I'd be just as happy with a -A flag that turns on
>>>> preallocation, but without any of getpagesizes() tuning. It'd force
>>>> one big slabs allocation and that's it.
>>>>
>>>>
>>>>>
>>>>> Also, I did some testing with this (-L) some time back (admittedly on
>>>>> OpenSolaris) and the actual behavior will vary based on the memory
>>>>> allocation library you're using and what it does with the OS
>>>>> underneath.  I didn't try Linux variations, but that may be worthwhile
>>>>> for you.  IIRC, default malloc would wait for page-fault to do the
>>>>> actual memory allocation, so there'd still be risk of fragmentation.
>>>>>
>>>>
>>>> We do use Linux, but haven't tested in production with my modified -L
>>>> patch. What I *have* noticed is that when we allocate a 512MB
>>>> hashtable, that shows up in linux as mmap-ed contiguous block of
>>>> memory. Fromhttp://m.linuxjournal.com/article/6390, we "For very
>>>> large requests, malloc() uses the mmap() system call to find
>>>> addressable memory space. This process helps reduce the negative
>>>> effects of memory fragmentation when large blocks of memory are freed
>>>> but locked by smaller, more recently allocated blocks lying between
>>>> them and the end of the allocated space."
>>>>
>>>> I was hoping to get the same large mmap for all our slabs, out of the
>>>> way in a different address space in a way that didn't interfere with
>>>> the actual memory allocator itself, so that the linux allocator could
>>>> then focus on balancing just the small allocations without any page
>>>> waste.
>>>>
>>>> Thanks,
>>>> Mike
>>>>
>>
>>
>

Re: Enabling large-page-allocations

Reply via email to