On Jul 10, 1:37 pm, Matt Ingenthron <[email protected]> wrote:
> Mike Lambert wrote:
> > Currently the -L flag is only enabled if
> > HAVE_GETPAGESIZES&&HAVE_MEMCNTL. I'm curious what the motivation is
> > for something like that? In our experience, for some memcache pools we
> > end up fragmenting memory due to the repeated allocation of 1MB slabs
> > around all the other hashtables and free lists going on. We know we
> > want to allocate all memory upfront, but can't seem to do that on a
> > Linux system.
>
> The primary motivation was more about not beating up the TLB cache on
> the CPU when running with large heaps. There are users with large heaps
> already, so this should help if the underlying OS supports large pages.
> TLB cache sizes are getting bigger in CPUs, but virtualization is more
> common and memory heaps are growing faster.
>
> I'd like to have some empirical data on how big a difference the -L flag
> makes, but that assumes a workload profile. I should be able to hack
> one up and do this with memcachetest, but I've just not done it yet. :)
>
> > To put it more concretely, here is a proposed change to make -L do a
> > contiguous preallocation even on machines without getpagesizes tuning.
> > My memcached server doesn't seem to crash, but I'm not sure if that's
> > a proper litmus test. What are the pros/cons of doing something like
> > this?
>
> This feels more related to the -k flag, and that it should be using
> madvise() in there somewhere too. It wouldn't be a bad idea to separate
> these necessarily. I don't know that the day after 1.4.0 is the day to
> redefine -L though, but it's not necessarily bad. We should wait for
> Trond's repsonse to see what he thinks about this since he implemented
> it. :)
Haha, yeah, the release of 1.4.0 reminded me I wanted to send this
email. Sorry for the bad timing.
-k keeps the memory from getting paged out to disk (which is a very
goodt hing in our case.)
-L appears to me (who isn't aware of what getpagesizes does) to be
related to preallocation with big allocations, which I thought was
what I wanted.
If you want, I'd be just as happy with a -A flag that turns on
preallocation, but without any of getpagesizes() tuning. It'd force
one big slabs allocation and that's it.
> Also, I did some testing with this (-L) some time back (admittedly on
> OpenSolaris) and the actual behavior will vary based on the memory
> allocation library you're using and what it does with the OS
> underneath. I didn't try Linux variations, but that may be worthwhile
> for you. IIRC, default malloc would wait for page-fault to do the
> actual memory allocation, so there'd still be risk of fragmentation.
We do use Linux, but haven't tested in production with my modified -L
patch. What I *have* noticed is that when we allocate a 512MB
hashtable, that shows up in linux as mmap-ed contiguous block of
memory. From http://m.linuxjournal.com/article/6390, we "For very
large requests, malloc() uses the mmap() system call to find
addressable memory space. This process helps reduce the negative
effects of memory fragmentation when large blocks of memory are freed
but locked by smaller, more recently allocated blocks lying between
them and the end of the allocated space."
I was hoping to get the same large mmap for all our slabs, out of the
way in a different address space in a way that didn't interfere with
the actual memory allocator itself, so that the linux allocator could
then focus on balancing just the small allocations without any page
waste.
Thanks,
Mike