Incidentally, why was "mallinfo" removed from memcache 1.4.0? Even without it being 64-bit aware, it still provided some useful data that I wasn't able to get via other means in our 1.2.6 binaries.
Mike On Thu, Jul 16, 2009 at 03:13, Mike Lambert<[email protected]> wrote: > Basically process memory was growing very slowly over time to > eventually cause machine swapping. It was leveling out (not a leak), > but at a level higher than we expected, even with hashtable and > maxbytes accounted for. So I was poking around at memory usage, and > decided that fragmentation was to blame. > > > Looking again right now at a machine configured with -m 6000 (so > ~6gb), I see "stats maps" showing a 512mb hashtable and 7.5gb heap. > > "stats malloc" (which isn't 64-bit aware) gives: > STAT mmapped_space 564604928 # this has the 512mb hashtable > STAT arena_size -1058820096 > STAT total_alloc -2040194320 > STAT total_free 981374224 > where arena_size = total_alloc+total_free. > > Knowing that the total size of the heap is 7.5gb, I can derive that > real_arena_size = -1058820096 + 2**32 * 3 = 7531114496. Doing > total_free/real_arena_size gives 13%, which is my estimate for > free-but-unallocated ram. (Free due to fragmentation or > not-yet-allocation is hard to tell, but that number is still very > high.) > > Alternately, one could ask why we have a 7.5gb heap for a 6gb > memcache...why so much ram? I calculated 100mb-200mb for 7600 > connections plus some various free lists, but I was running into the > problem that total_free indicates there are still 981mb of unallocated > ram in the heap. So I think at the time I concluded this was due to > fragmentation. > > > We solved our problem by reducing the amount of ram we gave to > memcache so we didn't swap, but in theory getting an extra 10-13% of > RAM out of our memcaches sounds like a great idea. And so given my > fragmentation conclusion, I was looking for ways to reduce that. > > > Thoughts? Is there perhaps another explanation for the data above? > > Thanks, > Mike > > On Wed, Jul 15, 2009 at 19:40, Matt Ingenthron<[email protected]> wrote: >> >> Hi Mike, >> >> Mike Lambert wrote: >>> >>> Trond, any thoughts? >>> >> >> Trond is actually on vacation, but I did steal a few cycles of his time and >> asked about this. >>> >>> I'd like to double-check that there isn't a reason we can't support >>> preallocation without getpagesizes() before attempting to manually >>> patch memcache and play with our production system here. >>> >> >> There's no reason you can't do that. There may be a slightly cleaner >> integration approach Trond and I talked through. I'll try to code that up >> here in the next few days... but for now you may try your approach to see if >> it helps alleviate the issue you were seeing. >> >> Incidentially, how did the memory fragmentation manifest itself on your >> system? I mean, could you see any effect on apps running on the system? >> >> >>> Thanks, >>> Mike >>> >>> On Jul 13, 8:38 pm, Mike Lambert <[email protected]> wrote: >>> >>>> >>>> On Jul 10, 1:37 pm, Matt Ingenthron <[email protected]> wrote: >>>> >>>> >>>> >>>> >>>> >>>> >>>>> >>>>> Mike Lambert wrote: >>>>> >>>>>> >>>>>> Currently the -L flag is only enabled if >>>>>> HAVE_GETPAGESIZES&&HAVE_MEMCNTL. I'm curious what the motivation is >>>>>> for something like that? In our experience, for some memcache pools we >>>>>> end up fragmenting memory due to the repeated allocation of 1MB slabs >>>>>> around all the other hashtables and free lists going on. We know we >>>>>> want to allocate all memory upfront, but can't seem to do that on a >>>>>> Linux system. >>>>>> >>>>> >>>>> The primary motivation was more about not beating up the TLB cache on >>>>> the CPU when running with large heaps. There are users with large heaps >>>>> already, so this should help if the underlying OS supports large pages. >>>>> TLB cache sizes are getting bigger in CPUs, but virtualization is more >>>>> common and memory heaps are growing faster. >>>>> I'd like to have some empirical data on how big a difference the -L >>>>> flag >>>>> makes, but that assumes a workload profile. I should be able to hack >>>>> one up and do this with memcachetest, but I've just not done it yet. :) >>>>> >>>>>> >>>>>> To put it more concretely, here is a proposed change to make -L do a >>>>>> contiguous preallocation even on machines without getpagesizes tuning. >>>>>> My memcached server doesn't seem to crash, but I'm not sure if that's >>>>>> a proper litmus test. What are the pros/cons of doing something like >>>>>> this? >>>>>> >>>>> >>>>> This feels more related to the -k flag, and that it should be using >>>>> madvise() in there somewhere too. It wouldn't be a bad idea to separate >>>>> these necessarily. I don't know that the day after 1.4.0 is the day to >>>>> redefine -L though, but it's not necessarily bad. We should wait for >>>>> Trond's repsonse to see what he thinks about this since he implemented >>>>> it. :) >>>>> >>>> >>>> Haha, yeah, the release of 1.4.0 reminded me I wanted to send this >>>> email. Sorry for the bad timing. >>>> >>>> -k keeps the memory from getting paged out to disk (which is a very >>>> goodt hing in our case.) >>>> -L appears to me (who isn't aware of what getpagesizes does) to be >>>> related to preallocation with big allocations, which I thought was >>>> what I wanted. >>>> >>>> If you want, I'd be just as happy with a -A flag that turns on >>>> preallocation, but without any of getpagesizes() tuning. It'd force >>>> one big slabs allocation and that's it. >>>> >>>> >>>>> >>>>> Also, I did some testing with this (-L) some time back (admittedly on >>>>> OpenSolaris) and the actual behavior will vary based on the memory >>>>> allocation library you're using and what it does with the OS >>>>> underneath. I didn't try Linux variations, but that may be worthwhile >>>>> for you. IIRC, default malloc would wait for page-fault to do the >>>>> actual memory allocation, so there'd still be risk of fragmentation. >>>>> >>>> >>>> We do use Linux, but haven't tested in production with my modified -L >>>> patch. What I *have* noticed is that when we allocate a 512MB >>>> hashtable, that shows up in linux as mmap-ed contiguous block of >>>> memory. Fromhttp://m.linuxjournal.com/article/6390, we "For very >>>> large requests, malloc() uses the mmap() system call to find >>>> addressable memory space. This process helps reduce the negative >>>> effects of memory fragmentation when large blocks of memory are freed >>>> but locked by smaller, more recently allocated blocks lying between >>>> them and the end of the allocated space." >>>> >>>> I was hoping to get the same large mmap for all our slabs, out of the >>>> way in a different address space in a way that didn't interfere with >>>> the actual memory allocator itself, so that the linux allocator could >>>> then focus on balancing just the small allocations without any page >>>> waste. >>>> >>>> Thanks, >>>> Mike >>>> >> >> >
